Algorithms for Compiler Design: LOOP OPTIMIZATION

7/24/2010 8:05:51 PM

10.3 LOOP OPTIMIZATION Loop optimization is the most valuable machine-independent optimization because a program's inner loops are good candidates for improvement. The important loop optimizations are elimination of loop invariant computations and elimination of induction variables. A loop invariant computation is one that computes the same value every time a loop is executed. Therefore, moving such a computation outside the loop leads to a reduction in the execution time. Induction variables are those variables used in a loop; their values are in lock-step, and hence, it may be possible to eliminate all except one.

1 Eliminating Loop Invariant Computations

To eliminate loop invariant computations, we first identify the invariant computations and then move them outside loop if the move does not lead to a change in the program's meaning. Identification of loop invariant computation requires the detection of loops in the program. Whether a loop exists in the program or not depends on the program's control flow, therefore, requiring a control flow analysis. For loop detection, a graphical representation, called a "program flow graph," shows how the control is flowing in the program and how the control is being used. To obtain such a graph, we must partition the intermediate code into basic blocks. This requires identifying leader statements, which are defined as follows:

The first statement is a leader statement.
The target of a conditional or unconditional goto is a leader.
A statement that immediately follows a conditional goto is a leader.

A basic block is a sequence of three-address statements that can be entered only at the beginning, and control ends after the execution of the last statement, without a halt or any possibility of branching, except at the end.

2 Algorithm to Partition Three-Address Code into Basic Blocks

To partition three-address code into basic blocks, we must identify the leader statements in the three-address code and then include all the statements, starting from a leader, and up to, but not including, the next leader. The basic blocks into which the three-address code is partitioned constitute the nodes or vertices of the program flow graph. The edges in the flow graph are decided as follows. If B1 and B2 are the two blocks, then add an edge from B1 to B2 in the program flow graph, if the block B2 follows B1 in an execution sequence. The block B2 follows B1 in an execution sequence if and only if:

The first statement of block B2 immediately follows the last statement of block B1 in the three-address code, and the last statement of block B1 is not an unconditional goto statement.
The last statement of block B1 is either a conditional or unconditional goto statement, and the first statement of block B2 is the target of the last statement of block B1.

For example, consider the following program fragment:

Fact(x)
{
   int f = 1;
   for(i = 2; i<=x; i++)
   f = f*i;
   return(f);
}

The three-address-code representation for the program fragment above is:

f = 1;
i = 2
if i <= x goto(8)
f = f *i
t1 = i + 1
i = t1
goto(3)
goto calling program

The leader statements are:

Statement number 1, because it is the first statement.
Statement number 3, because it is the target of a goto.
Statement number 4, because it immediately follows a conditional goto statement.
Statement number 8, because it is a target of a conditional goto statement.

Therefore, the basic blocks into which the above code can be partitioned are as follows, and the program flow graph is shown in Figure 1.

Block B1:
Block B2:
Block B3:
Block B4:

Figure 1: Program flow graph.

3 Loop Detection

A loop is a cycle in the flow graph that satisfies two properties:

It should have a single entry node or header, so that it will be possible to move all of the loop invariant computations to a unique place, called a "preheader," which is a block/node placed outside the loop, just in front of the header.
It should be strongly connected; that is, it should be possible to go from any node of the loop to any other node while staying within the loop. This is required until at least some of the loops get executed repeatedly.

If the flow graph contains one or more back edges, then only one or more loops/ xcycles exist in the program. Therefore, we must identify any back edges in the flow graph.

4 Identification of the Back Edges

To identify the back edges in the flow graph, we compute the dominators of every node of the program flow graph. A node a is a dominator of node b if all the paths starting at the initial node of the graph that reach to node b go through a. For example, consider the flow graph in Figure 2. In this flow graph, the dominator of node 3 is only node 1, because all the paths reaching up to node 3 from node 1 do not go through node 2.

Figure 2: The flow graph back edges are identified by computing the dominators.

Dominator (dom) relationships have the following properties:

They are reflexive; that is, every node dominates itself.
That are transitive; that is, if a dom b and b dom c, this implies a dom c.

5 Reducible Flow Graphs

Several code-optimization transformations are easy to perform on reducible flow graphs. A flow graph G is reducible if and only if we can partition the edges into two disjointed groups, forward edges and back edges, with the following two properties:

The forward edges form an acyclic graph in which every node can be reached from the initial node G.
The back edges consist only of edges whose heads dominate their tails.

For example, consider the flow graph shown in Figure 3. This flow graph has no back edges, because no edge's head dominates the tail of that edge. Hence, it could have been a reducible graph if the entire graph had been acyclic. But that is not the case. Therefore, it is not a reducible flow graph.

Figure 3: A flow graph with no back edges.

After identifying the back edges, if any, the natural loop of every back edge must be identified. The natural loop of a back edge a → b is the set of all those nodes that can reach a without going through b, including node b itself. Therefore, to find a natural loop of the back edge n → d, we start with node n and add all the predecessors of node n to the loop. Then we add the predecessors of the nodes that were just added to the loop; and we continue this process until we reach node d. These nodes plus node d constitute the set of all those nodes that can reach node n without going through node d. This is the natural loop of the edge n → d. Therefore, the algorithm for detecting the natural loop of a back edge is:

Input :    back edge n → d.
Output:    set loop, which is a set of nodes forming the natural
           loop of the back edge n → d.
main()
{
      loop = { d } / * Initialize by adding node d to the set loop*/
      insert(n); /* call a procedure insert with the node n */
}
procedure insert(m)
{
   if m is not in the loop then
      {
      loop = loop ∪ { m }
      for every predecessor p of m do
        insert(p);
      }
}

For example in the flow graph shown in Figure 1, the back edges are edge B3 → B2, and the loop is comprised of the blocks B2 and B3.

After the natural loops of the back edges are identified, the next task is to identify the loop invariant computations. The three-address statement x = y op z, which exists in the basic block B (a part of the loop), is a loop invariant statement if all possible definitions of b and c that reach upto this statement are outside the loop, or if b and c are constants, because then the calculation b op c will be the same each time the statement is encountered in the loop. Hence, to decide whether the statement x = b op c is loop invariant or not, we must compute the u−d chaining information. The u−d chaining information is computed by doing a global data flow analysis of the flow graph. All of the definitions that are capable of reaching to a point immediately before the start of the basic block are computed, and we call the set of all such definitions for a block B the IN(B). The set of all the definitions capable of reaching to a point immediately after the last statement of block B will be called OUT(B). We compute both IN(B) and OUT(B) for every block B, GEN(B) and KILL(B), which are defined as:

GEN(B): The set of all the definitions generated in block B.
KILL(B): The set of all the definitions outside block B that define the same variables as are defined in block B.

Consider the flow graph in Figure 4.

The GEN and KILL sets for the basic blocks are as shown in Table 1.

Table 1: GEN and KILL sets for Figure 4 Flow Graph
Block	GEN	KILL
B1	{1,2}	{6,10,11}
B2	{3,4}	{5,8}
B3	{5}	{4,8}
B4	{6,7}	{2,9,11}
B5	{8,9}	{4,5,7}
B6	{10,11}	{1,2,6}

Figure 4: Flow graph with GEN and KILL block sets.

IN(B) and OUT(B) are defined by the following set of equations, which are called "data flow equations":

The next step, therefore, is to solve these equations. If there are n nodes, there will be 2n equations in 2n unknowns. The solution to these equations is not generally unique. This is because we may have a situation like that shown in Figure 5, where a block B is a predecessor of itself.

Figure 5: Nonunique solution to a data flow equation, where B is a predecessor of itself.

If there is a solution to the data flow equations for block B, and if the solution is IN(B) = IN₀ and OUT(B) = OUT₀, then IN₀ ∪ {d} and OUT₀ ∪ {d}, where d is any definition not in IN₀. OUT₀ and KILL(B) also satisfy the equations, because if we take OUT₀ ∪ {d} as the value of OUT(B), since B is one of the predecessors of itself according to IN(B) = ∪ OUT(P), d gets added to IN(B), because d is not in the KILL(B). Hence, we get IN(B) = IN₀ ∪ {d}. And according to OUT(B) = IN(B) − KILL(B) GEN(B), OUT(B) = OUT₀ ∪ {d} gets satisfied. Therefore, IN₀, OUT₀ is one of the solutions, whereas IN₀ ∪ {d}, OUT₀ ∪ {d} is another solution to the equations—no unique solution. What we are interested in is finding smallest solution, that is, the smallest IN(B) and OUT(B) for every block B, which consists of values that are in all solutions. For example, since IN₀ is in IN₀ ∪ {d}, and OUT₀ is in OUT₀ ∪ {d}, IN₀, OUT₀ is the smallest solution. And this is what we want, because the smallest IN(B) turns out to be the set of all definitions reaching the point just before the beginning of B. The algorithm for computing the smallest IN(B) and OUT(B) is as follows:

For each block B do
```
{
      IN(B)= φ
      OUT(B)= GEN(B)
}
```
flag = true

while (flag) do

{
      flag = false
      for each block B do
      {
      IN_new(B) = Φ
      for each predecessor P of B
      IN_new(B) = IN_new(B) ∪ OUT(P)
      if IN_new(B) ≠ IN(B) then
         {
            flag = true
            IN(B) = IN_new(B)
            OUT(B) = IN(B) - KILL(B) ∪ GEN(B)
         }
      }
}

Initially, we take IN(B) for every block that is to be an empty set, and we take OUT(B) for GEN(B), and we compute IN_new(B). If it is different from IN(B), we compute a new OUT(B) and go for the next iteration. This is continued until IN(B) comes out to be the same for every B in a previous or current iteration.

For example, for the flow graph shown in Figure 5, the IN and OUT iterations for the blocks are computed using above algorithm, as shown in Tables 2–6.

Table 2: IN and OUT Computation for Figure 5
Block	IN	OUT
B1	Φ	{1,2}
B2	Φ	{3,4}
B3	Φ	{5}
B4	Φ	{6,7}
B5	Φ	{8,9}
B6	Φ	{10,11}

Table 3: First Iteration for the IN and OUT Values
Block	IN	OUT
B1	Φ	{1,2}
B2	{1,2,6,7}	{1,2,3,4,6,7}
B3	{3,4,8,9}	{3,5,9}
B4	{3,4,5}	{3,4,5,6,7}
B5	{5}	{8,9}
B6	{6,7}	{7,10,11}

Table 4: Second Iteration for the IN and OUT Values
Block	IN	OUT
B1	Φ	{1,2}
B2	{1,2,3,4,5,6,7}	{1,2,3,4,6,7}
B3	{1,2,3,4,6,7,8,9}	{1,2,3,5,6,7,9}
B4	{1,2,3,4,5,6,7,9}	{1,3,4,5,6,7}
B5	{3,5,9}	{3,8,9}
B6	{3,4,5,6,7}	{3,4,5,7,10,11}

Table 5: Third Iteration for the IN and OUT Values
Block	IN	OUT
B1	Φ	{1,2}
B2	{1,2,3,4,5,6,7}	{1,2,3,4,6,7}
B3	{1,2,3,4,6,7,8,9}	{1,2,3,5,6,7,9}
B4	{1,2,3,4,5,6,7,9}	{1,3,4,5,6,7}
B5	{1,2,3,5,6,7,9}	{1,2,3,6,8,9}
B6	{1,3,4,5,6,7}	{1,3,4,5,7,10,11}

Table 6: Fourth Iteration for the IN and OUT Values
Block	IN	OUT
B1	Φ	{1,2}
B2	{1,2,3,4,5,6,7}	{1,2,3,4,6,7}
B3	{1,2,3,4,6,7,8,9}	{1,2,3,5,6,7,9}
B4	{1,2,3,4,5,6,7,9}	{1,3,4,5,6,7}
B5	{1,2,3,5,6,7,9}	{1,2,3,6,8,9}
B6	{1,3,4,5,6,7}	{1,3,4,5,7,10,11}

The next step is to compute the u−d chains from the reaching definitions information, as follows.

If the use of A in block B is preceded by its definition, then the u−d chain of A contains only the last definition prior to this use of A. If the use of A in block B is not preceded by any definition of A, then the u−d chain for this use consists of all definitions of A in IN(B).

For example, in the flow graph for which IN and OUT were computed in Tables 2–6, the use of a in definition 4, block B2 is preceded by definition 3, which is the definition of a. Hence, the u−d chain for this use of a only contains definition 3. But the use of b in B2 is not preceded by any definition of b in B2. Therefore, the u−d chain for this use of B will be {1}, because this is the only definition of b in IN(B2).

The u−d chain information is used to identify the loop invariant computations. The next step is to perform the code motion, which moves a loop invariant statement to a newly created node, called "preheader," whose only successor is a header of the loop. All the predecessors of the header that lie outside the loop will become predecessors of the preheader.

But sometimes the movement of a loop invariant statement to the preheader is not possible because such a move would alter the semantics of the program. For example, if a loop invariant statement exists in a basic block that is not a dominator of all the exits of the loop (where an exit of the loop is the node whose successor is outside the loop), then moving the loop invariant statement in the preheader may change the semantics of the program. Therefore, before moving a loop invariant statement to the preheader, we must check whether the code motion is legal or not. Consider the flow graph shown in Figure 6.

Figure 6: A flow graph containing a loop invariant statement.

In the flow graph shown in Figure 6, x = 2 is the loop invariant. But since it occurs in B3, which is not the dominator of the exit of loop, if we move it to the preheader, as shown in Figure 7, a value of two will always get assigned to y in B5; whereas in the original program, y in B5 may get value one as well as two.

Figure 7: Moving a loop invariant statement changes the semantics of the program.

After Moving x = 2 to the Preheader

In the flow graph shown in Figure 7, if x is not used outside the loop, then the statement x = 2 can be moved to the preheader. Therefore, for a code motion to be legal, the following conditions must be met, even if no errors are encountered:

The block in which a loop invariant statement occurs should be a dominator of all exits of the loop, or the name assigned to the block should not be used outside the loop.
We cannot move a loop invariant statement assigned to A into preheader if there is another statement in the loop that assigns to A. For example, consider the flow graph shown in Figure 8.

Figure 8: Moving the preheader changes the meaning of the program.

Even though the statement x = 3 in B2 satisfies condition (1), moving it to the preheader will change the meaning of the program. Because if x = 3 is moved to the preheader, then the value that will be assigned to y in B5 will be two if the execution path is B1–B2–B3–B4–B2–B4–B5. Whereas for the same execution path, the original program assigns a three to y in B5.
The move is illegal if A is used in the loop, and A is reached by any definition of A other than the statement to be moved. For example, consider the flow graph shown in Figure 9.

Figure 9: Moving a value to the preheader changes the original meaning of the program.

Even though x is not used outside the loop, the statement x = 2 in the block B2 cannot be moved to the preheader, because the use of x in B4 is also reached by the definition x = 1 in B1. Therefore, if we move x = 2 to the preheader, then the value that will get assigned to a in B4 will always be a one, which is not the case in the original program.