Instanced Rendering
There will probably be times
when you want to draw the same object many times. Imagine a fleet of
starships, or a field of grass. There could be thousands of copies of
what are essentially identical sets of geometry, modified only slightly
from instance to instance. A simple application might just loop over all
of the individual blades of grass in a field and render them
separately, calling glDrawArrays once
for each blade and perhaps updating a set of shader uniforms on each
iteration. Supposing each blade of grass were made up of a strip of four
triangles, the code might look something like Listing 2.
Listing 2. Drawing the Same Geometry Many Times
glBindVertexArray(grass_vao);
for (int n = 0; n < number_of_blades_of_grass; n++) {
SetupGrassBladeParameters();
glDrawArrays(GL_TRIANGLE_STRIP, 0, 6);
}
|
How many blades of grass are there in a field? What is the value of number_of_blades_of_grass?
It could be thousands, maybe millions. Each blade of grass is likely to
take up a very small area on the screen, and the number of vertices
representing the blade is also very small. Your graphics card doesn’t
really have a lot of work to do to render a single blade of grass, and
the system is likely to spend most of its time sending commands to
OpenGL rather than actually drawing anything. OpenGL addresses this
through instanced rendering, which is a way to ask it to draw many copies of the same geometry.
Instanced rendering is a
method provided by OpenGL to specify that you want to draw many copies
of the same geometry with a single function call. This functionality is
accessed through instanced rendering functions, such as
void glDrawArraysInstanced(GLenum mode, GLint first, GLsizei count, GLsizei primcount);
and
void glDrawElementsInstanced(GLenum mode, GLsizei count, GLenum type, const void * indices, GLsizei primcount);
These two functions behave much like glDrawArrays and glDrawElements, except that they tell OpenGL to render primcount copies of the geometry. The first parameters of each (mode, first, and count for glDrawArraysInstanced, and mode, count, type, and indices for glDrawElementsInstanced)
take the same meaning as in the regular, noninstanced versions of the
functions. When you call one of these functions, OpenGL makes any
preparations it needs to draw your geometry (such as copying vertex data
to the graphics card’s memory, for example) only once and then renders
the same vertices many times.
If all that these functions did were send many copies of the same vertices to OpenGL as if glDrawArrays or glDrawElements
had been called in a tight loop, they wouldn’t be very useful. One of
the things that makes instanced rendering usable and very powerful is a
special, built-in variable in GLSL named gl_InstanceID. The gl_InstanceID variable appears in GLSL as if it were an integer uniform. When the first copy of the vertices is sent to OpenGL, gl_InstanceID will be zero. It will then be incremented once for each copy of the geometry and will eventually reach primcount - 1. Because gl_InstanceID is an integer, there is a practical upper limit of a couple of billion instances that you can render in one call to glDrawArraysInstanced or glDrawElementsInstanced,
but that should be enough for the vast majority of applications. If you
need to render more than two billion copies of your geometry, your
application will probably run very slowly anyway, and you won’t see a
significant performance penalty for breaking your rendering into blocks
of, say one billion vertices.
The glDrawArraysInstanced function essentially operates as if the code in Listing 3 were executed.
Listing 3. Pseudo-code Illustrating the Behavior of glDrawArraysInstanced
// Loop over all of the instances (i.e. primcount)
for (int n = 0; n < primcount; n++) {
// Set the gl_InstanceID uniform – here gl_InstanceID is a C variable
// holding the location of the 'virtual' gl_InstanceID uniform.
glUniform1i(gl_InstanceID, n);
// Now, when we call glDrawArrays, the gl_InstanceID variable in the
// shader will contain the index of the instance that's being rendered.
glDrawArrays(mode, first, count);
}
|
Likewise, the glDrawElementsInstanced function operates similarly to the code in Listing 4.
Listing 4. Pseudo-code Illustrating the Behavior of glDrawElementsInstanced
for (int n = 0; n < primcount; n++) {
// Set the value of gl_InstanceID
glUniform1i(gl_InstanceID, n);
// Make a normal call to glDrawElements
glDrawElements(mode, count, type, indices);
}
|
Of course, gl_InstanceID is not a real uniform, and you can’t get a location for it by calling glGetUniformLocation. The value of gl_InstanceID
is managed by OpenGL and is very likely generated in hardware, meaning
that it’s essentially free to use in terms of performance. The power of
instanced rendering comes from imaginative use of this variable, along
with instanced arrays, which are explained in a moment.
The value of gl_InstanceID
can be used directly as a parameter to a shader function or to index
into data such as textures or uniform arrays. To return to our example
of the field of grass, let’s figure out what we’re going to do with gl_InstanceID
to make our field not just be thousands of identical blades of grass
growing out of a single point. Each of our grass blades is made out of a
little triangle strip with four triangles in it, a total of just six
vertices. It could be tricky to get them to all look different. However,
with some shader magic, we can make each blade of grass look
sufficiently different so as to produce an interesting output. We won’t
go over the shader code here , but we walk through a few ideas of how you can
use gl_InstanceID to add variation to your scenes.
First, we need each blade of
grass to have a different position; otherwise, they’ll all be drawn on
top of each other. Let’s arrange the blades of grass more or less
evenly. If the number of blades of grass we’re going to render is a
power of two, we can use half the bits of gl_InstanceID
to represent the x coordinate of the a blade, and the y coordinate to
represent the z coordinate (our ground lies in the x-z plane, with y
being altitude). For this example, we render 2^20, or a little over a
million blades of grass (actually 1,048,576 blades, but who’s
counting?). By using the ten least significant bits (bits 9 through 0)
as the x coordinate and the ten most significant bits (19 through 10) as
the z coordinate, we have a uniform grid of grass blades. Let’s take a
look at Figure 2 to see what we have so far.
Our uniform grid of grass
probably looks a little plain, as if a particularly attentive
groundskeeper hand-planted each blade. What we really need to do is
displace each blade of grass by some random amount within its grid
square. That’ll make the field look a little less uniform. A simple way
of generating random numbers is to multiply a seed value by a large
number and take a subset bits of the resulting product and use it as the
input to a function. We’re not aiming for a perfect distribution here,
so this simple generator should do. Usually, with this type of
algorithm, you’d reuse the seed value as input to the next iteration of
the random number generator. In this case, though, we can just use gl_InstanceID directly as we’re really generating the next few numbers after gl_InstanceID
in a pseudo-random sequence. By iterating over our pseudo-random
function only a couple of times, we can get a reasonably random
distribution. Because we need to displace in both x and z, we generate
two successive random numbers from gl_InstanceID and use them to displace the blade of grass within the plane. Look at Figure 3 to see what we get now.
At this point, our
field of grass is distributed evenly with random perturbations in
position for each blade of grass. All the grass blades look the same,
though. (Actually, we used the same random number generator to assign a
slightly different color to each blade of grass just so that they’d show
up in the figures.) We can apply some variation over the field to make
each blade look slightly different. This is something that we’d probably
want to have control over, so we use a texture to hold information
about blades of grass.
You have an x and a z coordinate for each blade of grass that was calculated by generating a grid coordinate directly from gl_InstanceID
and then generating a random number and displacing the blade within the
x-z plane. That coordinate pair can be used as a coordinate to look up a
texel within a 2D texture, and you can put whatever you want in it.
Let’s control the length of the grass using the texture. We can put a
length parameter in the texture (let’s use the red channel) and multiply
the y coordinate of each vertex of the grass geometry by that to make
longer or shorter grass. A value of zero in the texture would produce
very short (or nonexistent) grass, and a value of one would produce
grass of some maximum length. Now you can design a texture where each
texel represents the length of the grass in a region of your field. Why
not draw a few crop circles? The texture can be sampled with GL_LINEAR sampling, and you can even use mipmapping.
Now, the grass is
evenly distributed over the field, and you have control of the length of
the grass in different areas. However, the grass blades are still just
scaled copies of each other. Perhaps we can introduce some more
variation. Next, we rotate each blade of grass around its axis according
to another parameter from the texture. We use the green channel of the
texture to store the angle through which the grass blade should be
rotated around the y-axis, with zero representing no rotation and one
representing a full 360 degrees. We’ve still only done one texture fetch
in our vertex shader, and still the only input to the shader is gl_InstanceID. Things are starting to come together. Take a look at Figure 4.
Our field is still looking
a little bland. The grass just sticks straight up and doesn’t move.
Real grass sways in the wind and gets flattened when things roll over
it. We need the grass to bend, and we’d like to have control over that.
Why not use another channel from the parameter texture (the blue
channel) to control a bend factor? We can use that as another angle and
rotate the grass around the x-axis before we apply the rotation in the
green channel. This allows us to make the grass bend over based on the
parameter in the texture. Use zero to represent no bending (the grass
stands straight up) and one to represent fully flattened grass.
Normally, the grass will sway gently, and so the parameter will have a
low value. When the grass gets flattened, the value can be much higher.
Finally, we can control the
color of the grass. It seems logical to just store the color of the
grass in a large texture. This might be a good idea if you want to draw a
sports field with lines, markings, or advertising on it for example,
but it’s fairly wasteful if the grass is all varying shades of green.
Instead, let’s make a palette for our grass in a 1D texture and use the
final channel within our parameter texture (the alpha channel) to store
the index into that palette. The palette can start with an anemic
looking dead-grass yellow at one end and a lush, deep green at the other
end. Now we read the alpha channel from the parameter texture along
with all the other parameters and use it to index into the 1D texture—a
dependent texture fetch. Our final field is shown in Figure 5.
Now, our final field has a
million blades of grass, evenly distributed, with application control
over length, “flatness,” direction of bend, or sway and color. Remember,
the only input to the shader that differentiates one blade of grass
from another is gl_InstanceID, the
total amount of geometry sent to OpenGL is six vertices, and the total
amount of code required to draw all the grass in the field is a single
call to glDrawArraysInstanced.
The
parameter texture can be read using linear texturing to provide smooth
transitions between regions of grass and can be a fairly low resolution.
If you want to make your grass wave in the wind or get trampled as
hoards of armies march across it, you can animate the texture by
updating it every frame or two and uploading a new version of it before
you render the grass. Also because the gl_InstanceID
is used to generate random numbers, adding an offset to it before
passing it to the random number generator allows a different but
predetermined chunk of “random” grass to be generated with the same
shader.