A
matrix (plural matrices) is a rectangular table of elements that is
used for mathematical purposes. Put another way, a matrix in computer
graphics is a 2D array of values that are used primarily to perform
various operations on vectors. Direct3D and OpenGL traditionally expect a
2D array of floating-point values for arrays, but HLSL and GLSL support
matrices of other data types such as integers and Booleans. Matrices
can be 4 × 4 (four columns and four rows), 3 × 4 (three columns and four
rows), 3 × 3 (three columns and three rows), and so forth.

One of the primary uses
for a matrix is to perform a math operation on a vector to change that
vector in some meaningful way. Matrices can store rotation, scaling, and
translational (positional) information. A 3 × 3 matrix, that is, a
matrix (2D array) with three rows and three columns, is used to store
rotational and scaling information. When you apply this matrix to a
vector, a process known as vector-matrix transformation, you can
essentially rotate the vector or scale it any way you want. A 4 × 4
matrix has this same information with the addition of positional
information in the last row of the 2D array. Figure 1 shows a visual of a matrix.

Figure 1. A 2D array as a matrix, where the last row stores the X, Y, and Z positional info.

When you transform a
vector by a 4 × 4 matrix, you can apply scaling, rotations, and
translations on any vector. Translation is the process of moving a
vector from one location to another. Since a vector and vertex can be
used the same way, you can transform the vertices of a 3D model using a
matrix to change the model’s position and orientation in the 3D world.
For example, let’s say you’ve created a 3D cube in an application such
as Softimage XSI. The position of the vertices is stored in what is
known as local or model space. This means the positions of the vertices
are not related to anything other than the application in which the
model was created. So if you create a box around the origin, you can
create something in XSI that looks like Figure 2.

Figure 2. A cube created in XSI.

Now let’s say you want to
use this new model that you’ve created in a game. Let’s also assume you
will be placing more than one box in your 3D scene. You have the option
of modeling the box in its unique position in XSI so that when the data
is loaded, the boxes and other objects will be in their correct
positions.

This method is inefficient and ineffective for the following reasons:

It would be a waste
of time to model an object more than once throughout a scene just so
you can have more than one instance of the object.

If
the base object changes (let’s say you want spheres instead of boxes),
you’ll have to repeat the process all over again by deleting all the
objects you’ve created and starting again.

What
if the objects are dynamic and are supposed to move around the scene?
How can this happen in code? The solution is the purpose of this
discussion, as you will see.

If
the objects are made up of thousands of polygons, why load what is
essentially the same object multiple times? This can lead to wasted
memory and resources. If you have 100 instances of this object in a
scene, that is 99 more objects than you need if the objects are all
exactly the same.

Current
hardware supports hardware instancing, which generally means drawing an
object with one draw call and mesh multiple times throughout the scene.
If each object has its own unique vertex data, there is no way to take
advantage of this feature.

When
you model an object in model space, you only need to create an object
once. You can then use a matrix to set the position of each instance of
the object that is to appear in the scene. So, for example, you load the
model once and create 10 matrices. Each of these matrices represents
the instance of the object as it is to appear rotated, scaled, and
translated in that position. In other words, you still have 10 objects,
but those 10 objects all share the same model data. You simply change
the matrix before drawing the object, and the scene will render as
you’ve intended. This is the primary purpose of matrices.

Since each object can have
its own matrix, each object can move, rotate, and be sized (scaled)
independently of all other objects. When applying physics and collision
detection, you can take into account the forces acting upon an object in
relation to the world around it to create simulations that mimic what
we observe in nature. To move one of these box examples around in a 3D
scene, we simply change the X, Y, Z translation of the matrix that
represents that object. This matrix is known as the model matrix. It is
also sometimes referred to as the world matrix, as it defines where in
the world the object is positioned and how it is rotated and scaled.

Matrices as Cameras

Matrices can be used for other
effects as well. In 3D games there exists the idea of a virtual camera.
This camera is actually a matrix called the view matrix that is applied
to the vertices of a scene to rotate and position objects on a global
level to give the illusion of a camera moving around in the virtual
world. Take a look at Figure 3.
On the left is the scene located at the default origin. The middle is
the scene where the position of the view matrix has been moved forward
by 20 along the Z axis, and the right is the scene from the middle
rotated to the left around the Y axis. The combination of the model and
view matrix is called the model-view matrix, and it is used to position
and orient objects based on their own world transformation as well as
being manipulated by the view matrix to simulate a 3D camera affect.

Figure 3. Scene at the origin (left), translated (middle), and rotated (right).

Another type of matrix
called the projection matrix can further simulate a camera. This matrix
is used to add orthogonal or perspective projection to the objects being
rendered. Orthogonal projection essentially renders objects the same
size on the screen regardless of how far back they are from one another.
This is useful in 2D scenes or 2D elements such as menus since the
depth of each object can be used to ensure the visual ordering of
overlapping objects on the screen, but it is not realistic when
rendering 3D scenes. In nature, objects appear smaller with distance.
This is your perspective on the world around you. A perspective matrix
essentially simulates this effect by scaling objects smaller as they
move away from the virtual camera. The projection matrix also adds a
field of view to the camera and far and near clipping planes. The near
and far clipping planes of the projection matrix represent how far away
from the camera an object can be and still be considered visible.

By
combining the model, view, and projection matrices, you get
model-view-projection (MVP) matrix. This matrix is commonly used in
vertex shaders to transform the incoming vertices before moving on to
the geometry shader (if one is present) and the pixel shader.

Keep
in mind that the model matrix is used to position, scale, and rotate
objects on an individual (personal) basis. The view matrix is used to
further adjust the vertices of the geometry in a scene to simulate a 3D
camera. The projection matrix is used to further simulate lens effects
for the 3D virtual camera.

When you transform a model
matrix from its local space, which is nothing more than the data you’ve
loaded from a file created by an application such as XSI, you are
converting the data from local space to world space. When you apply a
view matrix to that, you are converting the data to view space. When you
apply a projection matrix to that data, you are converting the data to
screen space. These spaces are known as transformation spaces.

The transformation space of a vertex is dictated by the matrices that have so far been applied to the vertex.

Direct3D
uses a left-handed coordinate system, while OpenGL uses a right-handed
system. In Direct3D you can change to a right-handed system, which
essentially changes the direction of the positive and negative X and Z
axes.

Matrix Operations

Matrices
are very useful in 3D games. You can combine matrices together using a
process called matrix concatenation. Mathematically, this means
multiplying matrices together. In Direct3D 10 you can use the D3DXMATRIX
structure to represent a matrix, and you can use the multiplication
symbol (*) to concatenate matrices together. Therefore, the
model-view-projection matrix is essentially the model matrix times the
view matrix times the projection matrix. The result is a single matrix
that represents everything each of the matrices it is made up of. So, a
single vector-matrix transformation can be used to move a vertex in
local space to screen space. Optionally, you can transform the vectors
by each matrix individually, which is what some of the demos in this
book do in the beginning of the vertex shaders.

The DirectX
SDK documentation specifies 34 matrix-related functions, which is a lot
to cover all at once, especially considering that most of these
functions will not be used for any demos in this book. In this article we will examine the functions relevant to the topics discussed in this
book. We will look at additional functions as they arise in demos. We
recommend that you read the DirectX SDK documentation for a brief
overview of each of these matrix-related functions so that when you do
need to use one, you can refer to the documentation and move on from
there.

To start, a matrix that will have no effect on a vector is known as an
identity matrix. Just like how adding a vector to another vector that
has all zeros for the X, Y, and Z axes will not affect the original
vector, transforming a vector by an identity matrix will have no effect
on that vector. An identity matrix can be thought of as a default
“empty” matrix. It is created by calling the D3DXMatrixIdentity()
function. This function takes a single parameter, the output address to
the matrix being set to an identity matrix. To test if a matrix is an
identity matrix you can call the function D3DXMatrixIsIdentity(), which takes as a parameter the matrix to test and returns true if the matrix is an identity matrix or false if it is not.

To create a view matrix, you can call the function D3DXMatrixLookAtLH() to create a left-handed coordinate system view matrix or D3DXMatrixLookAtRH() to create a right-handed version. The D3DXMatrixLookAtLH()
and RH functions take as parameters the output address of the matrix
being created by the function call, the position of the camera, the
location at which the camera is looking, and the direction that is
considered up.

To multiply (concatenate) two matrices together you can use the * multiplication operator or the D3DXMatrixMultiply()
function, which takes as parameters the output address of the matrix
being created by this function call, the first matrix in the operation,
and the second matrix in the operation.

To set a matrix’s position (translation), you call the D3DXMatrixTranslation()
function, which takes as parameters the output address matrix and the
X, Y, and Z position that is being set in the matrix. The X, Y, and Z
positions are floating-point values.

A matrix can be rotated by calling the D3DXMatrixRotationAxis()
function, which takes as parameters the out matrix, the vector axis to
rotate around, and an angle amount to rotate by specified in radians
(not degrees). To use angles measured in degrees, you can use the
DirectX macro D3DXToRadian(degrees). To
use this macro, you send the degrees in the parameter, and during
compilation the macro will be replaced with the mathematical equation
necessary to change degrees to radians.

Other functions you can use to rotate a matrix are D3DXMatrixRotateX(), D3DXMatrixRotateY(), and D3DXMatrixRotateZ(). These functions are used to rotate along a specific unit-axis while D3DXMatrixRotationAxis() takes a vector that can be used to specify one or more axes to rotate around at once in a single D3DXVECTOR3 object. Each of these rotation functions takes as parameters the out matrix and a floating-point angle defined in radians.

The last rotation functions are D3DXMatrixRotationQuaternion(), which rotates a matrix by a quaternion, and D3DXMatrixRotationYawPitchRoll().
To rotate along
the yaw, pitch, and roll basically means that the order of rotation
will occur on the Z axis (yaw), followed by the X (pitch) and Y (roll)
axes. These terms should be familiar to anyone who has played or
developed a flight simulator game.

To scale a matrix you can call the D3DXMatrixScaling() function, which takes as parameters the out matrix and the X, Y, and Z scales to apply to the matrix.

The last functions we will
discuss deal with the projection matrix. Although only the left-handed
versions of these functions will be discussed, keep in mind that each of
these functions has right-handed equivalents. The orthogonal and
perspective projection functions from the DirectX SDK are as follows: