Programming with DirectX : Game Math - Matrices

2/26/2011 5:06:11 PM

A matrix (plural matrices) is a rectangular table of elements that is used for mathematical purposes. Put another way, a matrix in computer graphics is a 2D array of values that are used primarily to perform various operations on vectors. Direct3D and OpenGL traditionally expect a 2D array of floating-point values for arrays, but HLSL and GLSL support matrices of other data types such as integers and Booleans. Matrices can be 4 × 4 (four columns and four rows), 3 × 4 (three columns and four rows), 3 × 3 (three columns and three rows), and so forth.

One of the primary uses for a matrix is to perform a math operation on a vector to change that vector in some meaningful way. Matrices can store rotation, scaling, and translational (positional) information. A 3 × 3 matrix, that is, a matrix (2D array) with three rows and three columns, is used to store rotational and scaling information. When you apply this matrix to a vector, a process known as vector-matrix transformation, you can essentially rotate the vector or scale it any way you want. A 4 × 4 matrix has this same information with the addition of positional information in the last row of the 2D array. Figure 1 shows a visual of a matrix.

Figure 1. A 2D array as a matrix, where the last row stores the X, Y, and Z positional info.

When you transform a vector by a 4 × 4 matrix, you can apply scaling, rotations, and translations on any vector. Translation is the process of moving a vector from one location to another. Since a vector and vertex can be used the same way, you can transform the vertices of a 3D model using a matrix to change the model’s position and orientation in the 3D world. For example, let’s say you’ve created a 3D cube in an application such as Softimage XSI. The position of the vertices is stored in what is known as local or model space. This means the positions of the vertices are not related to anything other than the application in which the model was created. So if you create a box around the origin, you can create something in XSI that looks like Figure 2.

Figure 2. A cube created in XSI.

Now let’s say you want to use this new model that you’ve created in a game. Let’s also assume you will be placing more than one box in your 3D scene. You have the option of modeling the box in its unique position in XSI so that when the data is loaded, the boxes and other objects will be in their correct positions.

This method is inefficient and ineffective for the following reasons:

It would be a waste of time to model an object more than once throughout a scene just so you can have more than one instance of the object.
If the base object changes (let’s say you want spheres instead of boxes), you’ll have to repeat the process all over again by deleting all the objects you’ve created and starting again.
What if the objects are dynamic and are supposed to move around the scene? How can this happen in code? The solution is the purpose of this discussion, as you will see.
If the objects are made up of thousands of polygons, why load what is essentially the same object multiple times? This can lead to wasted memory and resources. If you have 100 instances of this object in a scene, that is 99 more objects than you need if the objects are all exactly the same.
Current hardware supports hardware instancing, which generally means drawing an object with one draw call and mesh multiple times throughout the scene. If each object has its own unique vertex data, there is no way to take advantage of this feature.

When you model an object in model space, you only need to create an object once. You can then use a matrix to set the position of each instance of the object that is to appear in the scene. So, for example, you load the model once and create 10 matrices. Each of these matrices represents the instance of the object as it is to appear rotated, scaled, and translated in that position. In other words, you still have 10 objects, but those 10 objects all share the same model data. You simply change the matrix before drawing the object, and the scene will render as you’ve intended. This is the primary purpose of matrices.

Since each object can have its own matrix, each object can move, rotate, and be sized (scaled) independently of all other objects. When applying physics and collision detection, you can take into account the forces acting upon an object in relation to the world around it to create simulations that mimic what we observe in nature. To move one of these box examples around in a 3D scene, we simply change the X, Y, Z translation of the matrix that represents that object. This matrix is known as the model matrix. It is also sometimes referred to as the world matrix, as it defines where in the world the object is positioned and how it is rotated and scaled.

Matrices as Cameras

Matrices can be used for other effects as well. In 3D games there exists the idea of a virtual camera. This camera is actually a matrix called the view matrix that is applied to the vertices of a scene to rotate and position objects on a global level to give the illusion of a camera moving around in the virtual world. Take a look at Figure 3 . On the left is the scene located at the default origin. The middle is the scene where the position of the view matrix has been moved forward by 20 along the Z axis, and the right is the scene from the middle rotated to the left around the Y axis. The combination of the model and view matrix is called the model-view matrix, and it is used to position and orient objects based on their own world transformation as well as being manipulated by the view matrix to simulate a 3D camera affect.

Figure 3. Scene at the origin (left), translated (middle), and rotated (right).

Another type of matrix called the projection matrix can further simulate a camera. This matrix is used to add orthogonal or perspective projection to the objects being rendered. Orthogonal projection essentially renders objects the same size on the screen regardless of how far back they are from one another. This is useful in 2D scenes or 2D elements such as menus since the depth of each object can be used to ensure the visual ordering of overlapping objects on the screen, but it is not realistic when rendering 3D scenes. In nature, objects appear smaller with distance. This is your perspective on the world around you. A perspective matrix essentially simulates this effect by scaling objects smaller as they move away from the virtual camera. The projection matrix also adds a field of view to the camera and far and near clipping planes. The near and far clipping planes of the projection matrix represent how far away from the camera an object can be and still be considered visible.

By combining the model, view, and projection matrices, you get model-view-projection (MVP) matrix. This matrix is commonly used in vertex shaders to transform the incoming vertices before moving on to the geometry shader (if one is present) and the pixel shader.

Keep in mind that the model matrix is used to position, scale, and rotate objects on an individual (personal) basis. The view matrix is used to further adjust the vertices of the geometry in a scene to simulate a 3D camera. The projection matrix is used to further simulate lens effects for the 3D virtual camera.

When you transform a model matrix from its local space, which is nothing more than the data you’ve loaded from a file created by an application such as XSI, you are converting the data from local space to world space. When you apply a view matrix to that, you are converting the data to view space. When you apply a projection matrix to that data, you are converting the data to screen space. These spaces are known as transformation spaces.

The transformation space of a vertex is dictated by the matrices that have so far been applied to the vertex.

Direct3D uses a left-handed coordinate system, while OpenGL uses a right-handed system. In Direct3D you can change to a right-handed system, which essentially changes the direction of the positive and negative X and Z axes.

Matrix Operations

Matrices are very useful in 3D games. You can combine matrices together using a process called matrix concatenation. Mathematically, this means multiplying matrices together. In Direct3D 10 you can use the D3DXMATRIX structure to represent a matrix, and you can use the multiplication symbol (*) to concatenate matrices together. Therefore, the model-view-projection matrix is essentially the model matrix times the view matrix times the projection matrix. The result is a single matrix that represents everything each of the matrices it is made up of. So, a single vector-matrix transformation can be used to move a vertex in local space to screen space. Optionally, you can transform the vectors by each matrix individually, which is what some of the demos in this book do in the beginning of the vertex shaders.

The DirectX SDK documentation specifies 34 matrix-related functions, which is a lot to cover all at once, especially considering that most of these functions will not be used for any demos in this book. In this article we will examine the functions relevant to the topics discussed in this book. We will look at additional functions as they arise in demos. We recommend that you read the DirectX SDK documentation for a brief overview of each of these matrix-related functions so that when you do need to use one, you can refer to the documentation and move on from there.

To start, a matrix that will have no effect on a vector is known as an identity matrix. Just like how adding a vector to another vector that has all zeros for the X, Y, and Z axes will not affect the original vector, transforming a vector by an identity matrix will have no effect on that vector. An identity matrix can be thought of as a default “empty” matrix. It is created by calling the D3DXMatrixIdentity() function. This function takes a single parameter, the output address to the matrix being set to an identity matrix. To test if a matrix is an identity matrix you can call the function D3DXMatrixIsIdentity(), which takes as a parameter the matrix to test and returns true if the matrix is an identity matrix or false if it is not.

To create a view matrix, you can call the function D3DXMatrixLookAtLH() to create a left-handed coordinate system view matrix or D3DXMatrixLookAtRH() to create a right-handed version. The D3DXMatrixLookAtLH() and RH functions take as parameters the output address of the matrix being created by the function call, the position of the camera, the location at which the camera is looking, and the direction that is considered up.

To multiply (concatenate) two matrices together you can use the * multiplication operator or the D3DXMatrixMultiply() function, which takes as parameters the output address of the matrix being created by this function call, the first matrix in the operation, and the second matrix in the operation.

To set a matrix’s position (translation), you call the D3DXMatrixTranslation() function, which takes as parameters the output address matrix and the X, Y, and Z position that is being set in the matrix. The X, Y, and Z positions are floating-point values.

A matrix can be rotated by calling the D3DXMatrixRotationAxis() function, which takes as parameters the out matrix, the vector axis to rotate around, and an angle amount to rotate by specified in radians (not degrees). To use angles measured in degrees, you can use the DirectX macro D3DXToRadian(degrees). To use this macro, you send the degrees in the parameter, and during compilation the macro will be replaced with the mathematical equation necessary to change degrees to radians.

Other functions you can use to rotate a matrix are D3DXMatrixRotateX(), D3DXMatrixRotateY(), and D3DXMatrixRotateZ(). These functions are used to rotate along a specific unit-axis while D3DXMatrixRotationAxis() takes a vector that can be used to specify one or more axes to rotate around at once in a single D3DXVECTOR3 object. Each of these rotation functions takes as parameters the out matrix and a floating-point angle defined in radians.

The last rotation functions are D3DXMatrixRotationQuaternion(), which rotates a matrix by a quaternion, and D3DXMatrixRotationYawPitchRoll(). To rotate along the yaw, pitch, and roll basically means that the order of rotation will occur on the Z axis (yaw), followed by the X (pitch) and Y (roll) axes. These terms should be familiar to anyone who has played or developed a flight simulator game.

To scale a matrix you can call the D3DXMatrixScaling() function, which takes as parameters the out matrix and the X, Y, and Z scales to apply to the matrix.

The last functions we will discuss deal with the projection matrix. Although only the left-handed versions of these functions will be discussed, keep in mind that each of these functions has right-handed equivalents. The orthogonal and perspective projection functions from the DirectX SDK are as follows:

D3DXMATRIX * D3DXMatrixOrthoLH(
   D3DXMATRIX *pOut,
   FLOAT w,
   FLOAT h,
   FLOAT zn,
   FLOAT zf
);

D3DXMATRIX * D3DXMatrixOrthoOffCenterLH(
   D3DXMATRIX *pOut,
   FLOAT l,
   FLOAT r,
   FLOAT b,
   FLOAT t,
   FLOAT zn,
   FLOAT zf
);

D3DXMATRIX * D3DXMatrixPerspectiveLH(
  D3DXMATRIX *pOut,
  FLOAT w,
  FLOAT h,
  FLOAT zn,
  FLOAT zf
);

D3DXMATRIX * D3DXMatrixPerspectiveFovLH(
  D3DXMATRIX *pOut,
  FLOAT fovy,
  FLOAT Aspect,
  FLOAT zn,
  FLOAT zf
);

D3DXMATRIX * D3DXMatrixPerspectiveOffCenterLH(
  D3DXMATRIX *pOut,
  FLOAT l,
  FLOAT r,
  FLOAT b,
  FLOAT t,
  FLOAT zn,
  FLOAT zf
);

Since each of these projection functions has overlapping parameters, we will discuss them all at once in the following list.

pOut refers to the output address to the matrix that will store the result of the function call.
w is the width of the view.
h is the height of the view.
zn is the near plane distance.
zf is the far plane distance.
l is the minimum value for the width (the minimum X of the view volume).
r is the maximum value for the width (the maximum X of the view volume).
b is the minimum value for the height (the minimum Y of the view volume).
t is the maximum value for the height (the maximum Y of the view volume).
fovy is the field of view of the camera specified in radians.
Aspect is the aspect ratio, which can be width/height or whatever value you deem appropriate.