1. Instruments
Of all the performance metrics, your
frame rate (number of
presentRenderbuffer calls per second) is the metric
that you’ll want to measure most often. This was convenient and simple, but
Apple’s Instruments tool (Figure 9-1) can accomplish the same thing, and much
more—without requiring you to modify your application!
The best part is that it’s included in the SDK that you
already have.
Warning:
As with any performance analysis tool, beware
of the Heisenberg effect. In this context, I’m referring to the fact
that measuring performance can, in itself, affect performance. This has
never been problematic for me, and it’s certainly not as bothersome as a
Heisenbug (a bug that seems to vanish when you’re in the
debugger).
First, be sure that your Active
SDK (upper-left corner of Xcode) is set to a Device
configuration rather than a Simulator configuration. Next, click the Run
menu, and select the Run with Performance Tool
submenu. You should see an option for OpenGL ES (if not, you probably have
your SDK set to Simulator).
Note:
Alternatively, you can run Instruments
directly from
/Developer/Applications/Instruments.app.
While your application is running, Instruments
is constantly updating an EKG-like graph of your frame rate and various
other metrics you may be interested in. Try clicking the information icon
in the OpenGL ES panel on the left, and then click the
Configure button. This allows you to pick and choose
from a slew of performance metrics.
Instruments is a great tool for many aspects of
performance analysis, not just OpenGL. I find that it’s particularly
useful for detecting memory leaks. The documentation for Instruments is
available on the iPhone developer site. I encourage you to read up on it
and give it a whirl, even if you’re not experiencing any obvious
performance issues.
2. Understand the CPU/GPU Split
Don’t forget that the bottleneck of your
application may be on the CPU side rather than in the graphics processing
unit (GPU). You can determine this by diving into your rendering code and
commenting out all the OpenGL function calls. Rerun your app with
Instruments, and observe the frame rate; if it’s unchanged, then the
bottleneck is on the CPU.
3. Vertex Submission: Above and Beyond VBOs
The manner in which you submit vertex data to
OpenGL ES can have a huge impact on performance. They eliminate costly memory transfers. VBOs don’t help
as much with older devices, but using them is a good habit to get
into.
9.3.1. Batch, Batch, Batch
VBO usage is just the tip of the iceberg.
Another best practice that you’ll hear a lot about is draw
call batching. The idea is simple: try to render as much as
possible in as few draw calls as possible. Consider how you’d go about
drawing a human head. Perhaps your initial code does something like
Example 1.
Example 1. Highly unoptimized OpenGL ES sequence
glBindTexture(...); // Bind the skin texture.
glDrawArrays(...); // Render the head.
glDrawArrays(...); // Render the nose.
glLoadMatrixfv(...); // Shift the model-view to the left side.
glDrawArrays(...); // Render the left ear.
glBindTexture(...); // Bind the eyeball texture.
glDrawArrays(...); // Render the left eye.
glLoadMatrixfv(...); // Shift the model-view to the right side.
glBindTexture(...); // Bind the skin texture.
glDrawArrays(...); // Render the right ear.
glBindTexture(...); // Bind the eyeball texture.
glDrawArrays(...); // Render the right eye.
glLoadMatrixfv(...); // Shift the model-view to the center.
glBindTexture(...); // Bind the lips texture.
glDrawArrays(...); // Render the lips.
|
Right off the bat, you should notice that the
head and nose can be “batched” into a single VBO. You can also do a bit
of rearranging to reduce the number of texture binding operations. Example 2 shows the result after this tuning.
Example 2. OpenGL ES sequence after initial tuning
glBindTexture(...); // Bind the skin texture.
glDrawArrays(...); // Render the head and nose.
glLoadMatrixfv(...); // Shift the model-view to the left side.
glDrawArrays(...); // Render the left ear.
glLoadMatrixfv(...); // Shift the model-view to the right side.
glDrawArrays(...); // Render the right ear.
glBindTexture(...); // Bind the eyeball texture.
glLoadMatrixfv(...); // Shift the model-view to the left side.
glDrawArrays(...); // Render the left eye.
glLoadMatrixfv(...); // Shift the model-view to the left side.
glDrawArrays(...); // Render the right eye.
glLoadMatrixfv(...); // Shift the model-view to the center.
glBindTexture(...); // Bind the lips texture.
glDrawArrays(...); // Render the lips.
|
Try combing through the code again to see
whether anything can be eliminated. Sure, you might be saving a little
bit of memory by using a single VBO to represent the ear, but suppose
it’s a rather small VBO. If you add two instances of the ear geometry to
your existing “head and nose” VBO, you can eliminate the need for
changing the model-view matrix, plus you can use fewer draw calls.
Similar guidance applies to the eyeballs. Example 3
shows the result.
Example 3. OpenGL ES sequence after second pass of tuning
glBindTexture(...); // Bind the skin texture.
glDrawArrays(...); // Render the head and nose and ears.
glBindTexture(...); // Bind the eyeball texture.
glDrawArrays(...); // Render both eyes.
glBindTexture(...); // Bind the lips texture.
glDrawArrays(...); // Render the lips.
|
By tweaking
your texture coordinates and combining the skin texture with the eye and
lip textures, you can reduce the rendering code to only two
lines:
glBindTexture(...); // Bind the atlas texture.
glDrawArrays(...); // Render the head and nose and ears and eyes and lips.
Note:
Pixomatic’s
ZBrush application is a favorite with
artists for generating texture atlases.
OK, I admit this example was rather
contrived. Rarely does production code make linear sequences of OpenGL
calls as I’ve done in these examples. Real-world code is usually
organized into subroutines, with plenty of stuff going on between the
draw calls. But, the same principles apply. From the GPU’s perspective,
your application is merely a linear sequence of OpenGL calls. If you
think about your code in this distilled manner, potential optimizations
can be easier to spot.
3.2. Interleaved Vertex Attributes
You might hear the term
interleaved data being thrown around in regard to
OpenGL optimizations. It is indeed a good practice, but it’s actually
nothing special.
struct Vertex {
vec3 Position;
vec3 Normal;
vec2 TexCoord;
};
When we create the VBO, we populate it with
an array of Vertex objects. When it comes time to
render the geometry, we usually do something like Example 4.
Example 4. Using interleaved attributes
glBindBuffer(...);
GLsizei stride = sizeof(Vertex);
// ES 1.1
glVertexPointer(3, GL_FLOAT, stride, 0);
glNormalPointer(GL_FLOAT, stride, offsetof(Vertex, Normal));
glTexCoordPointer(2, GL_FLOAT, stride, offsetof(Vertex, TexCoord));
// ES 2.0
glVertexAttribPointer(positionAttrib, 3, GL_FLOAT, GL_FALSE, stride, 0);
glVertexAttribPointer(normalAttrib, 3, GL_FALSE,
GL_FALSE, stride, offsetof(Vertex, Normal));
glVertexAttribPointer(texCoordAttrib, 2, GL_FLOAT,
GL_FALSE, stride, offsetof(Vertex, TexCoord));
|
OpenGL does not require you to arrange VBOs
in the previous manner. For example, consider a small VBO with only
three vertices. Instead of arranging it like this:
Position-Normal-TexCoord-Position-Normal-TexCoord-Position-Normal-TexCoord
you could lay it out it like this:
Position-Position-Position-Normal-Normal-Normal-TexCoord-TexCoord-TexCoord
This is perfectly acceptable (but not
advised); Example 5 shows the way you’d submit
it to OpenGL.
Example 5. Unoptimal vertex layout
glBindBuffer(...);
// ES 1.1
glVertexPointer(3, GL_FLOAT, sizeof(vec3), 0);
glNormalPointer(GL_FLOAT, sizeof(vec3), sizeof(vec3) * VertexCount);
glTexCoordPointer(2, GL_FLOAT, sizeof(vec2),
2 * sizeof(vec3) * VertexCount);
// ES 2.0
glVertexAttribPointer(positionAttrib, 3, GL_FLOAT,
GL_FALSE, sizeof(vec3), 0);
glVertexAttribPointer(normalAttrib, 3, GL_FALSE,
GL_FALSE, sizeof(vec3),
sizeof(vec3) * VertexCount);
glVertexAttribPointer(texCoordAttrib, 2, GL_FLOAT,
GL_FALSE, sizeof(vec2),
2 * sizeof(vec3) * VertexCount);
|
When you submit vertex data in this manner,
you’re forcing the driver to reorder the data to make it amenable to the
GPU.
3.3. Optimize Your Vertex Format
One aspect of vertex layout you might be
wondering about is the ordering of attributes. With OpenGL ES 2.0 and
newer Apple devices, the order has little or no impact on performance
(assuming you’re using interleaved data). On first- and
second-generation iPhones, Apple recommends the following
order:
Texture coordinate (first stage)
Texture coordinate (second stage)
Don’t
forget there are other types you can use. For example, floating point is
often overkill for color, since colors usually don’t need as much
precision as other attributes.
// ES 1.1
// Lazy iPhone developer:
glColorPointer(4, GL_FLOAT, sizeof(vertex), offset);
// Rock Star iPhone developer!
glColorPointer(4, GL_UNSIGNED_BYTE, sizeof(vertex), offset);
// ES 2.0
// Lazy:
glVertexAttribPointer(color, 4, GL_FLOAT, GL_FALSE, stride, offset);
// Rock star!
glVertexAttribPointer(color, 4, GL_UNSIGNED_BYTE,
GL_FALSE, stride, offset);
Warning:
Don’t use GL_FIXED.
Because of the iPhone’s architecture, fixed-point numbers actually
require more processing than floating-point
numbers. Fixed-point is available only to comply with the Khronos
specification.
Apple recommends aligning vertex attributes
in memory according to their native alignment. For example, a 4-byte
float should be aligned on a 4-byte boundary. Sometimes you can deal
with this by adding padding to your vertex format:
struct Vertex {
vec3 Position;
unsigned char Luminance;
unsigned char Alpha;
unsigned short Padding;
};
3.4. Use the Best Topology and Indexing
Apple’s general advice (at the time of this
writing) is to prefer GL_TRIANGLE_STRIP over
GL_TRIANGLES. Strips require fewer vertices but
usually at the cost of more draw calls. Sometimes you can reduce the
number of draw calls by introducing degenerate triangles into your
vertex buffer.
Strips versus separate triangles, indexed
versus nonindexed; these all have trade-offs. You’ll find that many
developers have strong opinions, and you’re welcome to review all the
endless debates on the forums. In the end, experimentation is the only
reliable way to determine the best tessellation strategy for your unique
situation.
Imagination Technologies provides code for
converting lists into strips. Look for
PVRTTriStrip.cpp in the OpenGL ES 1.1 version of
the PowerVR SDK. It also provides a sample app to show
it off (Demos/OptimizeMesh).