Open GL : Storing Transformed Vertices—Transform Feedback (part 3)

- Game Review : Football Manager 2015

- World Soccer: Winning Eleven 2015

10/17/2012 4:21:01 AM

Example Uses for Transform Feedback

Here are a couple of examples of how you might use data stored in a transform feedback buffer. Remember, though, OpenGL is very flexible, and there are a myriad of other potential applications for transform feedback.

Storing Intermediate Results

The first example usage for transform feedback is the storage of intermediate results. You already read about instanced rendering. Consider an algorithm that performs a set of operations per instance and then requires the results of those operations per vertex. Now imagine that you want to render many copies of the object using instanced rendering. You could set up a vertex shader that uses as its input a few instanced arrays and a few regular, per-vertex attributes. All of those per-instance calculations would have to be performed for every copy of the object, even though they produce identical results each time.

Instead of writing one, large vertex shader that does all of the calculations in a single pass, it is possible to break this kind of algorithm into two passes. Write a first vertex shader that calculates the common per-instance results and writes them as a set of output varyings into a transform feedback buffer. This shader can now be run once, per instance. Next, write a second vertex shader that performs the rest of the calculations (those that will be different for each copy of the object) and combines them with the intermediate results from the first vertex shader by reading the per-instance attributes using an instanced array.

Now that you have your pair of shaders, you can run the first shader once for each instance (using a regular glDrawArrays command) and then use the second to actually render each copy of the object. The first shader (the per-instance one) should be run with rasterization off (using the GL_RASTERIZER_DISCARD enable discussed earlier). This produces the intermediate results in the transform feedback buffer without actually rendering anything. Now, turn rasterization back on and render all of the individual copies of the object using the second shader and a call to one of the instanced rendering functions such as glDrawArarysInstanced.

Iterative or Recursive Algorithms

Many algorithms are recursive, recirculating results from one step to another. Physical simulations are a prime example of this type of algorithm, and transform feedback is an ideal way to produce data that is reused in subsequent passes. Because transform feedback writes data into buffers in a format that allows those buffers to be subsequently bound as vertex buffers, no conversion or copying is required between passes over the data. All that is required is a simple double-buffering scheme.

A good example of a recirculating algorithm is a particle system simulation. At each step in the simulation, each particle has a position and a velocity that must be updated. It may also have some fixed parameters such as mass, color, or any number of other attributes. To produce a simple particle system using transform feedback, each particle can be represented as a vertex and its attributes stored in vertex buffers. A vertex shader can be constructed that calculates an updated position and velocity for the particles in the system. The particle parameters that don’t change between iterations of the particle system can be stored in one vertex buffer, best allocated using the GL_STATIC_DRAW usage mode. The parameters that change between allocations should be double-buffered. One buffer is used as a vertex buffer and the source of parameters for rendering the particle system. The second buffer is bound as a transform feedback buffer and updated parameters written into it by the vertex shader. Between each iteration, the two buffers are swapped.

When the particle system is rendered, a time-step is passed to the vertex shader to indicate how much time has passed since the last update. The vertex shader calculates the approximate force on the particle due to its mass (gravity), input velocity (wind resistance), and any other factors important to the application; integrates the particle’s velocity over the appropriate time-step; and produces a new position and velocity.

To simply render the particles as points, send the particles to OpenGL using a command such as glDrawArrays with GL_POINTS as the primitive type. You may want to only update the particle positions using transform feedback but draw something more complex at each particle (a ball, or spaceship, for example). You can do this by enabling GL_RASTERIZER_DISCARD to turn off rasterization during the update phase and then use the position data as an input to a second pass that turns the points into more complex sets of geometry for rendering on the screen.

An In-Depth Example of Transform Feedback—Flocking

Let’s combine these two examples into one and create an implementation of a flocking algorithm. Flocking algorithms show emergent behavior within a large group by updating the properties of individual members independently of all others. This kind of behavior is regularly seen in nature, and examples are swarms of bees, flocks of birds, and schools of fish apparently moving in unison even though the members of the group don’t communicate directly. The decisions made by an individual are based solely on its perception of the other members of the group. However, no collaboration is made between members over the outcome of any particular decision. This means that each group member’s new properties can be calculated in parallel—ideal for a GPU implementation.

To demonstrate both of the ideas outlined previously (storing intermediate results and iterative algorithms), we implement the flocking algorithm with a pair of vertex shaders. We represent each member of the flock as a single vertex. Each vertex has a position and a velocity updated by the first vertex shader. The result is written to a buffer using transform feedback. That buffer is then bound as a vertex buffer and used as an instanced input to the second shader. Each member of the flock is an instance in the second draw. The second vertex shader is responsible for transforming a mesh (perhaps a model of a bird) into the position and orientation calculated in the first vertex shader. The algorithm then iterates, starting again with the first vertex shader, reusing the positions and velocities calculated in the previous pass. No data leaves the graphics card’s memory, and the CPU is not involved in any calculations.

The data structures we need in this example are a set of VAOs to represent the vertex array state for each pass and a set of VBOs to hold the positions and velocities of the members within the flock and the vertex data for the model we use to represent them. The flock positions and velocities need to be double-buffered because we can’t read and write the same buffer at the same time using transform feedback. Also because each member of the flock (vertex) needs to have access to the current position and velocity of all the other members of the flock, we bind the position and velocity buffers to a pair of texture buffer objects (TBOs) simultaneously. That way, the vertex shader can read arbitrarily from the TBO to access the properties of other vertices.

Figure 3 illustrates the passes that the algorithm makes.

Figure 3. Stages in the iterative flocking algorithm.

In (a), we perform the update for an even frame. The first position and velocity buffers are bound as input to the vertex shader, and the second position and velocity buffers are bound as transform feedback buffers. Notice that we also use the first set of position and velocity buffers as backing for textures (actually TBOs) that are used by the vertex shader. Next we render, in (b), using the same set of buffers as inputs as in the update pass. We use the same buffers as input in both the update and render passes so that the render pass has no dependency on the update pass. That means that OpenGL may be able to start working on the render pass before the update pass has finished. The position and velocity buffers are now instanced, and the additional geometry buffer is used to provide vertex position data.

In (c), we move to the next frame. The buffers have been exchanged—the second set of buffers is now the input to the vertex shader, and the first set is written using transform feedback. Finally, in (d), we render the odd frames. The second set of buffers is used as input to the vertex shader. Notice, though, that the flock_geometry buffer is a member of both render_vao1 and render_vao2 because the same data is used in both passes, and so we don’t need two copies of it.

The code to set all that up is shown in Listing 2 . It isn’t particularly complex, but there is a fair amount of repetition, making it long. The listing contains the bulk of the initialization, with some parts omitted for brevity (those parts are indicated by *** in the comments).

Listing 2. Initializing Data Structures for the Flocking Example

// Create the four VAOs – update_vao1, update_vao2, render_vao1 and render
// vao2. Yes, we could use an array, but for the purposes of this example,
// this is more explicit
glGenVertexArrays(1, &update_vao1);
// *** Create update_vao2, render_vao1 and render_vao2 the same way

// Create the buffer objects. We'll bind and initialize them in a moment
glGenBuffers(1, &flock_positions1);
// *** Create flock_positions2, flock_velocities1, flock_velocities2 and
// flock_geometry the same way

// Set up the VAOs and buffers – first update_vao1
glBindVertexArray(update_vao1);
glBindBuffer(GL_ARRAY_BUFFER, flock_positions1);
// *** Put some initial positions in flock_positions1 here
glVertexAttribPointer(0, 3, GL_FLOAT, GL_FALSE, 0, NULL);
glEnableVertexAttribArray(0);
glBindBuffer(GL_ARRAY_BUFFER, flock_velocities1);
// *** Initialize flock_velocities1 with zeroes
//     (glBufferData(... NULL), glMapBuffer, memset, for example))
glVertexAttribPointer(1, 3, GL_FLOAT, GL_FALSE, 0, NULL);
glEnableVertexAttribArray(1);

// Next, update_vao2
// *** This is pretty much the same as update_vao1, except we don't need
// *** initial data for flock_positions2 or flock_velocities2 because
// *** they'll be written on the first pass. We do need to allocate them
// *** using glBufferData(... NULL), though

// Now the render VAOs – render_vao1 first
// We bind the same flock_positions1 and flock_positions2 buffers to this
// VAO, but this time they're instanced arrays. We also bind flock_geometry
// as a regular vertex array
glBindVertexArray(render_vao1);
glBindBuffer(GL_ARRAY_BUFFER, flock_positions1);
glVertexAttribPointer(0, 3, GL_FLOAT, GL_FALSE, 0, NULL);
glEnableVertexAttribArray(0);
glVertexAttribDivisor(0, 1);
glBindBuffer(GL_ARRAY_BUFFER, flock_velocities1);
glVertexAttribPointer(1, 3, GL_FLOAT, GL_FALSE, 0, NULL);
glEnableVertexAttribArray(1);
glVertexAttribDivisor(1, 1);
glBindBuffer(GL_ARRAY_BUFFER, flock_geometry);
glVertexAttribPointer(2, 3, GL_FLOAT, GL_FALSE, 0, NULL);
glEnableVertexAttribArray(2);

// Set up render_vao2
// *** This looks just like the setup for render_vao1, except we're using
// *** flock_positions2, and flock_velocities2. Note, though, that we'd
// *** still bind flock_geometry because that doesn't change from iteration
// *** to iteration.

// Finally, set up the TBOs
glGenTextures(1, &position_texture1);
glBindTexture(GL_TEXTURE_BUFFER, position_texture1);
glBindBuffer(GL_TEXTURE_BUFFER, flock_positions1);
// *** Create a buffer texture for each of flock_velocities1, flock_position2,
// *** and flock_velocities2 in the same way

Once we have our buffers set up, we need to compile our shaders and link them together in a program. Before the program is linked, we need to bind the attributes in the vertex shader to the appropriate locations so that they match the vertex arrays that we set up. We also need to tell OpenGL which varyings we’re planning on writing to the transform feedback buffers. Listing 3 shows how the vertex attributes and transform feedback varyings are initialized.

Listing 3. Initializing Attributes and Transform Feedback for the Flocking Example

// *** Assume we've created our vertex and fragment shaders, compiled them
// *** and attached them to our program object.
// First, we'll set up the attributes in the update program
glBindAttribLocation(update_program, 0, "position");
glBindAttribLocation(update_program, 1, "velocity");
// Now the rendering program. The first two attributes are actually the
// same as those written by the update_program. The third is the position
// of the vertices in the geometry
glBindAttribLocation(render_program, 0, "instance_position");
glBindAttribLocation(render_program, 1, "instance_velocity");
glBindAttribLocation(render_program, 2, "geometry_position");
// Now we set up the transform feedback varyings:
static const char * tf_varyings[] = { "position_out", "velocity_out" };
glTransformFeedbackVaryings(update_program, 2, tf_varyings,
                            GL_SEPARATE_ATTRIBS);
// Now, everything's set up so we can go ahead and link our program objects
glLinkProgram(update_program);
glLinkProgram(render_program);

Now we need a rendering loop to update our flock positions and draw the members of the flock. It’s actually pretty simple, now that we have our data encapsulated in VAOs. The rendering loop is shown in Listing 4.

Listing 4. The Rendering Loop for the Flocking Example

// Make the update program current
glUseProgram(update_program);
// We use one set of buffers as shader inputs, and another as transform
// feedback buffers to hold the shader outputs. On alternating frames,
// we'll swap the two around
if (frame_index & 1)  {
    glBindBufferBase(GL_TRANSFORM_FEEDBACK_BUFFER, 0, position_buffer1);
    glBindBufferBase(GL_TRANSFORM_FEEDBACK_BUFFER, 1, velocity_buffer1);
    glBindVertexArray(update_vao2);
    glActiveTexture(GL_TEXTURE0);
    glBindTexture(GL_TEXTURE_BUFFER, position_texture2);
    glActiveTexture(GL_TEXTURE1);
    glBindTexture(GL_TEXTURE_BUFFER, velocity_texture2);
} else {
    // *** This is the same again, only using position_buffer2, and velocity
    // *** buffer2 as transform feedback buffers, and update_vao1, position
    // *** texture1 and velocity_texture1 as shader inputs
}
// Turn off rasterization (enable rasterizer discard)
glEnable(GL_RASTERIZER_DISCARD);
// Start transform feedback – record updated positions
glBeginTransformFeedback(GL_POINTS);
// Draw arrays – one point for each member of the flock
glDrawArrays(GL_POINTS, 0, flock_size);
// Done with transform feedback
glEndTransformFeedback(GL_POINTS);
// Ok, now we'll draw everything. Need to turn rasterization back on.
glDisable(GL_RASTERIZER_DISCARD);
// Use the rendering program
glUseProgram(render_program);
if (frame_index & 1) {
    glBindVertexArray(render_vao2);
} else {
    glBindVertexArray(render_vao1);
}
// Do an instanced draw – each member is an instance. The data updated
// by the 'update_program' on the last frame is now an instanced array
// in the render_program
glDrawArraysInstanced(GL_TRIANGLES, 0, 50, flock_size);
frame_index++;

That’s pretty much the interesting part of the program side. Let’s take a look at the shader side of things. The flocking algorithm works by applying a set of rules for each member of the flock to decide which direction to travel in. Each rule considers the current properties of the flock member and the properties of the other members of the flock as perceived by the individual being updated. Most of the rules require access to the other member’s position and velocity data, so update_program uses a pair of TBOs to read from the buffers containing that information. Listing 5 shows the start of the update vertex shader.

Listing 5. Initializing Attributes and Transform Feedback for the Flocking Example

#version 150

precision highp float;

// These are the input attributes
in vec3 position;
in vec3 velocity;

// These get written to transform feedback buffers
out vec3 position_out;
out vec3 velocity_out;

// These are the TBOs that are mapped to the same buffers as position
// and velocity
uniform samplerBuffer tex_position;
uniform samplerBuffer tex_velocity;

// The number of members in the flock
uniform int flock_size;

// Parameters for the simulation
uniform Parameters
{
    // *** Put all the simulation parameters here
};

The main body of the program is simple. We simply read the position and velocity of the other members of the flock, apply each rule in turn, sum up the resulting vector, and output an updated position and velocity. Code to do this is given in Listing 6.

Listing 6. Main Body of the Flocking Update Vertex Shader

void main(void)
{
    vec3 other_position;
    vec3 other_velocity;
    vec3 accelleration = vec3(0.0);
    int i;

    for (i = 0; i < flock_size; i++) {
       other_position = texelFetch(tex_position, i).xyz;
       other_velocity = texelFetch(tex_velocity, i).xyz;
       accelleraton += rule1(position, velocity,
                             other_position, other_velocity);
       accelleraton += rule2(position, velocity,
                             other_position, other_velocity);
       // *** And so on... we can apply as many rules as we want.
       // *** Three or four is is enough to produce a convincing
       // *** simulation
    }

    position_out = position + velocity;
    velocity_out = velocity + acceleration / float(flock_size);
}

Now we have to define our rules. The rules we use are as follows:

Members try not to hit each other. They need to stay at least a short distance from each other.
Members try to fly in the same direction as those around them.
Members try to keep with the rest of the flock. They will fly toward the center of the flock.

Listing 7 contains the shader code for the first rule. If we’re closer to another member than we’re supposed to be, we simply move away from that member:

Listing 7. The First Rule of Flocking

vec3 rule1(vec3 my_position, vec3 my_velocity,
           vec3 their_position, vec3 their_velocity)
{
    vec3 d = my_position – their_position;
    if (dot(d, d) < parameters.closest_allowed_position)
        return d * parameters.rule1_weight;
    return vec3(0.0);
}

Here’s the shader code for the second rule (see Listing 8 ). It returns a change in velocity weighted by the inverse square of the distance from to other member.

Listing 8. The Second Rule of Flocking

vec3 rule2(vec3 my_position, vec3 my_velocity,
           vec3 their_position, vec3 their_velocity)
{
    vec3 dv = (their_velocity – my_velocity);
    return parameters.rule2_weight *
           dv / (dot(my_position, their_position) + 1.0);
}

Putting all this together along with any other rules we want to implement completes the update part of the program. Now we need to produce the second vertex shader—the one responsible for rendering the flock. This uses the position and velocity data as instanced arrays and transforms a fixed set of vertices into position based on the position and velocity of the individual member. Listing 9 shows the inputs to the shader.

Listing 9. Declarations of Inputs to the Flocking Rendering Vertex Shader

#version 150

precision highp float;

// These are the instanced arrays
in vec3 instance_position;
in vec3 instance_velocity;

// The regular geometry array
in vec3 position;

The body of our shader (given in Listing 10) simply transforms the mesh represented by position into the correct orientation and location for the particular instance.

Listing 10. Flocking Vertex Shader Body

void main(void)
{
    // rotate_to_match is a function that rotates a point
    // (position) around the origin to match a direction vector
    // (instance_velocity)
    vec3 local_position = rotate_to_match(position, instance_velocity);
    gl_Position = mvp * vec4(instance_position + local_position, 1.0);
}

Open GL : Storing Transformed Vertices—Transform Feedback (part 2)

Open GL : Storing Transformed Vertices—Transform Feedback (part 1)

Other

1 Month With… Sphero

The Hot Five Stuffs For October 2012 : Sony Xperia T, BMW C Evolution, Philips FWP3200 Mini Hi-Fi, Oculus Rift VR Headset, Lego 4x4 Crawler

Spotlight – Money Dashboard

DirectX 10 : The 2D Resurgence - Sprite Animation

DirectX 10 : The 2D Resurgence - Getting the Sprites Moving

Denon Cocoon - Nothing Would Sound Sweeter

ASUS Xonar Phoebus – For Serious Audio Enthusiasts

Samsung NX210 – Snap N Share

Compact Digital Cameras Under $300 (Part 5) - Samsung MV800

Compact Digital Cameras Under $300 (Part 4) - Panasonic Lumix DMC-FS45