MOBILE

iphone 3D Programming : Optimizing - Vertex Submission: Above and Beyond VBOs

11/13/2012 2:51:24 AM

1. Instruments

Of all the performance metrics, your frame rate (number of presentRenderbuffer calls per second) is the metric that you’ll want to measure most often. This was convenient and simple, but Apple’s Instruments tool (Figure 9-1) can accomplish the same thing, and much more—without requiring you to modify your application! The best part is that it’s included in the SDK that you already have.


Warning:

As with any performance analysis tool, beware of the Heisenberg effect. In this context, I’m referring to the fact that measuring performance can, in itself, affect performance. This has never been problematic for me, and it’s certainly not as bothersome as a Heisenbug (a bug that seems to vanish when you’re in the debugger).


Figure 1. Instruments


First, be sure that your Active SDK (upper-left corner of Xcode) is set to a Device configuration rather than a Simulator configuration. Next, click the Run menu, and select the Run with Performance Tool submenu. You should see an option for OpenGL ES (if not, you probably have your SDK set to Simulator).


Note:

Alternatively, you can run Instruments directly from /Developer/Applications/Instruments.app.


While your application is running, Instruments is constantly updating an EKG-like graph of your frame rate and various other metrics you may be interested in. Try clicking the information icon in the OpenGL ES panel on the left, and then click the Configure button. This allows you to pick and choose from a slew of performance metrics.

Instruments is a great tool for many aspects of performance analysis, not just OpenGL. I find that it’s particularly useful for detecting memory leaks. The documentation for Instruments is available on the iPhone developer site. I encourage you to read up on it and give it a whirl, even if you’re not experiencing any obvious performance issues.

2. Understand the CPU/GPU Split

Don’t forget that the bottleneck of your application may be on the CPU side rather than in the graphics processing unit (GPU). You can determine this by diving into your rendering code and commenting out all the OpenGL function calls. Rerun your app with Instruments, and observe the frame rate; if it’s unchanged, then the bottleneck is on the CPU.

3. Vertex Submission: Above and Beyond VBOs

The manner in which you submit vertex data to OpenGL ES can have a huge impact on performance. They eliminate costly memory transfers. VBOs don’t help as much with older devices, but using them is a good habit to get into.

9.3.1. Batch, Batch, Batch

VBO usage is just the tip of the iceberg. Another best practice that you’ll hear a lot about is draw call batching. The idea is simple: try to render as much as possible in as few draw calls as possible. Consider how you’d go about drawing a human head. Perhaps your initial code does something like Example 1.

Example 1. Highly unoptimized OpenGL ES sequence
glBindTexture(...);  // Bind the skin texture.
glDrawArrays(...);   // Render the head.
glDrawArrays(...);   // Render the nose.
glLoadMatrixfv(...); // Shift the model-view to the left side.
glDrawArrays(...);   // Render the left ear.

glBindTexture(...);  // Bind the eyeball texture.
glDrawArrays(...);   // Render the left eye.
glLoadMatrixfv(...); // Shift the model-view to the right side.

glBindTexture(...);  // Bind the skin texture.
glDrawArrays(...);   // Render the right ear.

glBindTexture(...);  // Bind the eyeball texture.
glDrawArrays(...);   // Render the right eye.
glLoadMatrixfv(...); // Shift the model-view to the center.

glBindTexture(...);  // Bind the lips texture.
glDrawArrays(...);   // Render the lips.

Right off the bat, you should notice that the head and nose can be “batched” into a single VBO. You can also do a bit of rearranging to reduce the number of texture binding operations. Example 2 shows the result after this tuning.

Example 2. OpenGL ES sequence after initial tuning
glBindTexture(...);  // Bind the skin texture.
glDrawArrays(...);   // Render the head and nose.
glLoadMatrixfv(...); // Shift the model-view to the left side.
glDrawArrays(...);   // Render the left ear.
glLoadMatrixfv(...); // Shift the model-view to the right side.
glDrawArrays(...);   // Render the right ear.

glBindTexture(...);  // Bind the eyeball texture.
glLoadMatrixfv(...); // Shift the model-view to the left side.
glDrawArrays(...);   // Render the left eye.
glLoadMatrixfv(...); // Shift the model-view to the left side.
glDrawArrays(...);   // Render the right eye.
glLoadMatrixfv(...); // Shift the model-view to the center.

glBindTexture(...);  // Bind the lips texture.
glDrawArrays(...);   // Render the lips.

Try combing through the code again to see whether anything can be eliminated. Sure, you might be saving a little bit of memory by using a single VBO to represent the ear, but suppose it’s a rather small VBO. If you add two instances of the ear geometry to your existing “head and nose” VBO, you can eliminate the need for changing the model-view matrix, plus you can use fewer draw calls. Similar guidance applies to the eyeballs. Example 3 shows the result.

Example 3. OpenGL ES sequence after second pass of tuning
glBindTexture(...);  // Bind the skin texture.
glDrawArrays(...);   // Render the head and nose and ears.

glBindTexture(...);  // Bind the eyeball texture.
glDrawArrays(...);   // Render both eyes.

glBindTexture(...);  // Bind the lips texture.
glDrawArrays(...);   // Render the lips.

By tweaking your texture coordinates and combining the skin texture with the eye and lip textures, you can reduce the rendering code to only two lines:

glBindTexture(...);  // Bind the atlas texture.
glDrawArrays(...);   // Render the head and nose and ears and eyes and lips.


					  


Note:

Pixomatic’s ZBrush application is a favorite with artists for generating texture atlases.


OK, I admit this example was rather contrived. Rarely does production code make linear sequences of OpenGL calls as I’ve done in these examples. Real-world code is usually organized into subroutines, with plenty of stuff going on between the draw calls. But, the same principles apply. From the GPU’s perspective, your application is merely a linear sequence of OpenGL calls. If you think about your code in this distilled manner, potential optimizations can be easier to spot.

3.2. Interleaved Vertex Attributes

You might hear the term interleaved data being thrown around in regard to OpenGL optimizations. It is indeed a good practice, but it’s actually nothing special.

struct Vertex {
    vec3 Position;
    vec3 Normal;
    vec2 TexCoord;
};

When we create the VBO, we populate it with an array of Vertex objects. When it comes time to render the geometry, we usually do something like Example 4.

Example 4. Using interleaved attributes
glBindBuffer(...);
GLsizei stride = sizeof(Vertex);

// ES 1.1
glVertexPointer(3, GL_FLOAT, stride, 0);
glNormalPointer(GL_FLOAT, stride, offsetof(Vertex, Normal));
glTexCoordPointer(2, GL_FLOAT, stride, offsetof(Vertex, TexCoord));

// ES 2.0
glVertexAttribPointer(positionAttrib, 3, GL_FLOAT, GL_FALSE, stride, 0);
glVertexAttribPointer(normalAttrib, 3, GL_FALSE, 
                      GL_FALSE, stride, offsetof(Vertex, Normal));
glVertexAttribPointer(texCoordAttrib, 2, GL_FLOAT, 
                      GL_FALSE, stride, offsetof(Vertex, TexCoord));

OpenGL does not require you to arrange VBOs in the previous manner. For example, consider a small VBO with only three vertices. Instead of arranging it like this:

Position-Normal-TexCoord-Position-Normal-TexCoord-Position-Normal-TexCoord

you could lay it out it like this:

Position-Position-Position-Normal-Normal-Normal-TexCoord-TexCoord-TexCoord

This is perfectly acceptable (but not advised); Example 5 shows the way you’d submit it to OpenGL.

Example 5. Unoptimal vertex layout
glBindBuffer(...);

// ES 1.1
glVertexPointer(3, GL_FLOAT, sizeof(vec3), 0);
glNormalPointer(GL_FLOAT, sizeof(vec3), sizeof(vec3) * VertexCount);
glTexCoordPointer(2, GL_FLOAT, sizeof(vec2), 
                  2 * sizeof(vec3) * VertexCount);

// ES 2.0
glVertexAttribPointer(positionAttrib, 3, GL_FLOAT, 
                      GL_FALSE, sizeof(vec3), 0);
glVertexAttribPointer(normalAttrib, 3, GL_FALSE, 
                      GL_FALSE, sizeof(vec3), 
                      sizeof(vec3) * VertexCount);
glVertexAttribPointer(texCoordAttrib, 2, GL_FLOAT, 
                      GL_FALSE, sizeof(vec2), 
                      2 * sizeof(vec3) * VertexCount);

When you submit vertex data in this manner, you’re forcing the driver to reorder the data to make it amenable to the GPU.

3.3. Optimize Your Vertex Format

One aspect of vertex layout you might be wondering about is the ordering of attributes. With OpenGL ES 2.0 and newer Apple devices, the order has little or no impact on performance (assuming you’re using interleaved data). On first- and second-generation iPhones, Apple recommends the following order:

  1. Position

  2. Normal

  3. Color

  4. Texture coordinate (first stage)

  5. Texture coordinate (second stage)

  6. Point size

  7. Bone weight

  8. Bone index

Don’t forget there are other types you can use. For example, floating point is often overkill for color, since colors usually don’t need as much precision as other attributes.

// ES 1.1
// Lazy iPhone developer:
glColorPointer(4, GL_FLOAT, sizeof(vertex), offset);

// Rock Star iPhone developer!
glColorPointer(4, GL_UNSIGNED_BYTE, sizeof(vertex), offset); 

// ES 2.0
// Lazy:
glVertexAttribPointer(color, 4, GL_FLOAT, GL_FALSE, stride, offset);

// Rock star!
glVertexAttribPointer(color, 4, GL_UNSIGNED_BYTE, 
                      GL_FALSE, stride, offset);


Warning:

Don’t use GL_FIXED. Because of the iPhone’s architecture, fixed-point numbers actually require more processing than floating-point numbers. Fixed-point is available only to comply with the Khronos specification.


Apple recommends aligning vertex attributes in memory according to their native alignment. For example, a 4-byte float should be aligned on a 4-byte boundary. Sometimes you can deal with this by adding padding to your vertex format:

struct Vertex {
    vec3 Position;
    unsigned char Luminance;
    unsigned char Alpha;
    unsigned short Padding;
};

3.4. Use the Best Topology and Indexing

Apple’s general advice (at the time of this writing) is to prefer GL_TRIANGLE_STRIP over GL_TRIANGLES. Strips require fewer vertices but usually at the cost of more draw calls. Sometimes you can reduce the number of draw calls by introducing degenerate triangles into your vertex buffer.

Strips versus separate triangles, indexed versus nonindexed; these all have trade-offs. You’ll find that many developers have strong opinions, and you’re welcome to review all the endless debates on the forums. In the end, experimentation is the only reliable way to determine the best tessellation strategy for your unique situation.

Imagination Technologies provides code for converting lists into strips. Look for PVRTTriStrip.cpp in the OpenGL ES 1.1 version of the PowerVR SDK. It also provides a sample app to show it off (Demos/OptimizeMesh).
Other  
 
Most View
New Gadgets For April 2013 (Part 3)
Netgear Centria WNDR4700 - Simplify Your Digital Home
The ZTE Light Tab 2 - The Light Of Your Life
Google Nexus 7 - Thin, Light, And Designed For Google Play
Wake Up Your Wi-Fi (Part 2)
Sondek LP 12 - The One And Only (Part 2)
Google Woos Software Developers At I/O
Kingston DataTraveler HyperX Predator 512GB - Super-Fast, Super-Cool, Super-Slick, Super Unaffordable
Corsair AX And Corsair Axi Power Supplies – Big Difference In Small Letter (Part 2)
The Contemporary APUs - AMD Trinity vs Intel Ivy Bridge (Part 2)
Popular Tags
Microsoft Access Microsoft Excel Microsoft OneNote Microsoft PowerPoint Microsoft Project Microsoft Visio Microsoft Word Active Directory Biztalk Exchange Server Microsoft LynC Server Microsoft Dynamic Sharepoint Sql Server Windows Server 2008 Windows Server 2012 Windows 7 Windows 8 Adobe Indesign Adobe Flash Professional Dreamweaver Adobe Illustrator Adobe After Effects Adobe Photoshop Adobe Fireworks Adobe Flash Catalyst Corel Painter X CorelDRAW X5 CorelDraw 10 QuarkXPress 8 windows Phone 7 windows Phone 8 BlackBerry Android Ipad Iphone iOS
Top 10
The BMW X4 – Strong Performance (Part 3)
The BMW X4 – Strong Performance (Part 2)
The BMW X4 – Strong Performance (Part 1)
The BMW X5 25d – Top Truck
The Champion – Widebody Gc8 Built For All The Right Reasons (Part 2)
The FPV GT-F – This Is The End (Part 2)
The FPV GT-F – This Is The End (Part 1)
Teenage Kicks - Britain's Youngest Hot Rodders (Part 3)
Teenage Kicks - Britain's Youngest Hot Rodders (Part 2)
Teenage Kicks - Britain's Youngest Hot Rodders (Part 1)