Algorithms for Compiler Design: SYNTAX-DIRECTED TRANSLATION SCHEMES TO SPECIFY THE TRANSLATION OF VARIOUS PROGRAMMING LANGUAGE CONSTRUCTS

7/24/2010 8:02:17 PM

6.7 SYNTAX-DIRECTED TRANSLATION SCHEMES TO SPECIFY THE TRANSLATION OF VARIOUS PROGRAMMING LANGUAGE CONSTRUCTS

Specifying the translation of the construct involves specifying the construct's syntactic structure, using CFG, and associating suitable semantic actions with the productions of the CFG. For example, if we want to specify the translation of the arithmetic expressions into postfix notation so they can be carried along with the parsing, and if the parsing method is LR, then first we write a grammar that specifies the syntactic structure of the arithmetic expressions. We then associate suitable semantic actions with the productions of the grammar. The expressions used for these associations are covered below.

1 Arithmetic Expressions

The grammar that specifies the syntactic structure of the expressions in a typical programming language will have the following productions:

Translating arithmetic expressions involves generating code to evaluate the given expression. Hence, for an expression a + b * c, the three-address code that is required to be generated is:

where t1 and t2 are pointers to the symbol table records that contain compiler-generated temporaries, and a, b, and c are pointers to the symbol table records that contain the programmer-defined names a, b, and c, respectively. Syntax-directed translation schemes to specify the translation of an expression into postfix notation are as follows:

where code is a string value attribute used to hold the postfix expression, and place is pointer value attribute used to link the pointer to the symbol record that contains the name of the identifier. The label getname returns the name of the identifier from the symbol table record that is pointed to by ptr, and concate(s₁, s₂, s₃) returns the concatenation of the strings s₁, s₂, and s₃, respectively. For the string a+b*c, the values of the attributes at the parse tree node are shown in Figure 1.

Figure 1: Values of attributes at the parse tree node for the string a + b * c.

id.place = addr(symtab rec of a)

Syntax-directed translation schemes to specify the translation of an expression into the syntax tree are as follows:

where ptr is pointer value attribute used to link the pointer to a node in the syntax tree, and place is pointer value attribute used to link the pointer to the symbol record that contains the name of the identifier. The mkleaf generates leaf nodes, and mknode generates intermediate nodes.

For the string a+b*c, the values of the attributes at the parse tree nodes are shown in Figure 2.

Figure 2: Values of the attributes at the parse tree nodes for a + b * c, id.place = addr(symtab rec of a).

id.place = addr(sumtab rec of a)

Syntax-directed translation schemes specify the translation of an expression into three-address code, as follows:

where ptr is a pointer value attribute used to link the pointer to the symbol record that contains the name of the identifier, mkleaf generates leafnodes, and mknode generates intermediate nodes. For the string a+b*c, the values of the attributes at the parse tree nodes are shown in Figure 3.

Figure 3: Values of the attributes at the parse tree nodes for a + b * c, id.place = addr(sumtab rec of a).

2 Boolean Expressions

One way of translating a Boolean expression is to encode the expression's true and false values as the integers one and zero, respectively. The code to evaluate the value of the expression in some temporary is generated as shown below:


E → E₁ relop E₂
          {
             t1 = gentemp();
             gencode(if E₁.place relop.val E₂.place
             goto(nextquad + 3));
             gencode(t1 = 0);
             gencode(goto(nextquad+2))
             gencode(t1 = 1)}
             E.place = t1;
          }

where nextquad keeps track of the index in the code array. The next statement will be inserted by the gencode procedure, and will update the value of nextquad. The following translation scheme:

translates the expression a < b to the following three-address code:

Similarly, a Boolean expression formed by using logical operators involves generating code to evaluate those operators in some temporary form, as shown below:

E → E1 lop E2
   {
     t1 = gentemp();
     gencode (t1 = E1.place lop.val E2.place);
     E.place = t1;
   }
E → not E1
   {
     t1 = gentemp();
     gencode (t1 = not E1.place)
     E.place = t1
   }
lop → and { lop.val = and}
lop → or { lop.val = or}

The translation scheme above translates the expressions a < b and c > d to the following three-address code:

Another way to translate a Boolean expression is to represent its value by a position in the three-address code sequence. For example, if we point to the statement labeled L1, then the value of the expression is true (1); whereas if we point to the statement labeled L2, then the value of the expression is false (0). In this case, the use of a temporary to hold either a one or zero, depending upon the true of false value of the expression, becomes redundant. This also makes it possible to decide the value of the expression without evaluating it completely. This is called a "short circuit" or "jumping the code". To discover the true/false value of the expression a<b or c>d, it is not necessary to completely evaluate the expression; if a<b is true, then the entire expression will be true. Similarly to discover the true/false value of the expression a<b and c>d, it is not necessary to completely evaluate the expression, because if a<b is false, then the entire expression will be false.

Tip�

Therefore a Boolean expression can be translated into two to three address statements, a conditional jump, and an unconditional jump. But the targets of these jumps are known at the time of translating a Boolean expression; hence, these jumps are generated without their targets, which are filled in later on.

Therefore, we must remember the indices of these jumps in the code array by using suitable attributes of E. For this, we use two pointer value attributes: E.true and E.false. The attribute E.true will hold the pointer to the list that contains the index of the conditional jump in the code array, whereas the attribute E.false will hold the pointer to the list that contains the index of the unconditional jump. The translation scheme for the Boolean expression that uses relational operators is as follows:

E → E₁ relop E₂
   {
     E.true = mklist(nextquad);
     E.false = mklist(nextquad + l);
     gencode (if E₁.place relop.val E₂.place goto);
     gencode (goto_);
   }

where mklist(ind) is a procedure that creates a list containing ind and returns a pointer to the created list.

The above translation scheme translates the expression a < b to the following three address code:

3 Short-Circuit Code for Logical Expressions

There are several methods to adequately handle the various elements of Boolean operators. These are covered by type below.

AND

Logical expressions that use the ‘and’ operator are expressions defined by the production E → E1 and E2. Generating the short-circuit code for these logical expressions involves setting the true value of the first expression, E1, to the start of the second expression, E2, in the code array. We make the true value of E the same as the true value of expression E2; and we make the false value of E the same as the false values of both E1 and E2. This requires remembering where E2 starts in the code array index, which means we must provision the memory of the nextquad value just before E2 is processed. This can accomplished by introducing a nullable nonterminal M before E2 in the above production, providing for a reduction by M→ ∈ just before the processing of E2. Hence, we can get a semantic action associated with this production and executed at this point. We therefore have a method for remembering the value of nextquad just before the E2 code is generated.

E → E₁ and M E₂                      {        backpatch(E1.true, M.quad);
                                E.true = E2.true;
                                E.false = merge(E1.false, E2.false);
                       }
M → ∈      {M.quad = nextquad; }

where backpatch(ptr,L) is a procedure that takes a pointer ptr to a list containing indices of the code array and fills the target of the statements at these indices in the code array by L.

OR

For an expression using the ‘or’ operator-that is, an expression defined by the production E → E1 or E2—generating the short-circuit code involves setting the false value of the first expression, E1, to the start of E2 in the code array, and making the false value of E the same as the false value of E2. The true value of E is assigned the same true value as both E1 and E2. This requires remembering where E2 starts in the code array index, which requires making a provision for remembering the value of nextquad just before the expression E2 is processed. This can achieved by introducing a nullable nonterminal M before E2 in the above production, providing for a reduction by M→ ∈ just before the processing of E2. Hence, we obtain a semantic action that is associated with this production and executed at this point; therefore, we have provisioned the recall of the value of nextquad just before the E2 code is generated.

E → E1 or M E2              {    backpatch(E1.false, M.quad);
                                 E.false = E2.false;
                                 E.true = merge(E1.true, E2.true);
                            }
M → ∈    {M.quad = nextquad; }

NOT

For the logical expression using the ‘not’ operator, that is, one defined by the production E → not E1, generating the short-circuit code involves making the false value of the expression E the same as the true value of E1. And the true value of E is assigned the false value of E1.


E → not E1   {
              E.true = E1.false
              E.false = E1.true
             }

The above translation scheme translates the expression a < b and c > d to the following three-address code:

For example, consider the following Boolean expression:

When the above translation scheme is used to translate this construct, the three-address code generated for it is as shown below, and the translation scheme is shown in Figure 3.

Figure 3: Translation scheme for a Boolean expression containing and, not, and or.

IF-THEN-ELSE

Since an if-then-else statement is composed of three components—a boolean expression E, a statement S1 that is to be executed when E is true, and a statement S2 that is to be executed when E is false—the translation of if-then-else involves making a provision for transferring control to the start of S1 if E is true, for transferring control to the start of S2 if E is false, and for transferring control to the next statement after the execution of S1 and S2 is over. This requires remembering where S1 starts in the index of the code array as well as remembering where S2 starts in the index of the code array.

This is achieved by introducing a nullable nonterminal M1 before the S1 and a nullable nonterminal M2 before the S2 in the above production, providing for the reduction by M1 → ∈ just before processing S1. Hence, we get a semantic action associated with this production and executed at this point, which enables the recall of the value of nextquad just before generating S1 code. Similarly, it provides for the reduction by M2 → ∈ just before processing S2, and we get a semantic action associated with production executed at this point, enabling the recall of the value of nextquad just before generating S2 code.

In addition, an unconditional jump is required at the end of S1 in order to transfer control to the statement that follows the if-then-else statement. To generate this unconditional jump, we add a nullable nonterminal N after S1 to the production and associate a semantic action with the production N → ∈, which takes care of generating this unconditional jump, as shown in Figure 4.


S → if E then M1 S1 N
               else M2 S2 {
                          backpatch (E.true, M1.quad)
                          backpatch (E.false, M2.quad)
                          S.next:
                              = merge (S1.next, S2.next, N.next)
                          }
M1 → ∈ { M1.quad = nextquad;}
M2 → ∈ { M2.quad = nextquad}
N →  ∈ {
                    N.next = mklist (nextquad);
                    gencode (goto...);
       }

Figure 4: The addition of the nullable nonterminal N facilitates an unconditional jump.

Hence, for the statement if a<b then x = y + z else p = q + r, the three-address code that is required to be generated is:

IF-THEN

Since an if-then statement is comprised of two components, a Boolean expression E and an S1 statement that will be executed when E is true, the translation of if-then involves making a provision for transferring control to the start of S1 code if E is either true or false, and a provision is made for transferring control to the next statement after the execution of S1 is over. This requires recalling the index of the start of S1 in the code array, and can be achieved by introducing a nullable nonterminal M before S1 in the production. This will provide for a reduction by M → ∈, just before the processing of S1. Hence, we can get a semantic action associated with this production and executed at this point, which makes a provisioning the recall of for remembering the value of nextquad just before generating code of S1 code is generated, as shown in Figure 5 below:


S → if E then M S1     {
                                 backpatch (E.true, M.quad);
                                 S.next = merge(E.false, S1.next)
                       }
M → ∈     { M.quad = nextquad; }

Figure 5: A nullable nonterminal M provisions the translation of if-then.

Hence, for the statement if a<b then x = y + z, the three-address code that is required to be generated is:

WHILE

Since a while statement has two components, a Boolean expression E and a statement S1, which is the statement to be executed repeatedly as long as E is true, the translation of while involves provisioning the transfer of control to the start of S1 code if E is true. The expression must be tested again after S1 execution is over, control must be transferred to the next statement if E is false. This requires remembering the index in the code array where S1 code starts as well as where the E code starts. This can be achieved by introducing a nullable nonterminal M2 before S1 in the production. This will provide for the reduction by M2 → ∈ just before the processing of S1. Hence, a semantic action is associated with this production and is executed at this point, enabling the recall of the value of nextquad just before generating S code. Similarly, introducing a nullable nonterminal M1 before E will provide for the reduction by M1 → ∈ just before the processing of E. Hence, a semantic action is now associated with this production and is executed at this point, provisioning the recall of the value of nextquad just before E code is generated, as shown in Figure 6.


 S → while M1 E
          do M2 S1  {
                       backpatch (E.true, M2.quad)
                       backpatch (S1.next, M1.quad)
                       S.next = E.false
                       gencode (goto(M1.quad))
                    }
M1 →∈ { M1.quad = nextquad; }
M2 →∈ { M2.quad = nextquad; }

Figure 6: The translation of the Boolean while statement is facilitated by a nullable nonterminal M.

Hence, for the statement while a<b do x = y + z, the three-address code that is required to be generated is:

DO-WHILE

Since a do-while statement is comprised of two components, a Boolean expression E and an S1 statement that is executed repeatedly as long as E is true (as well as the test for whether E is true or false at the end of S1 execution), translation involves provisioning the transfer of control to test the expression after the execution of S1 is over. Control must also be transferred to the start of S1 code if E is true, and conversely to the next statement if E is false.

This requires recalling the S1 start index in the code array as well as the E start index. We introduce a nullable nonterminal M1 before S1 in the production, providing for the reduction by M1 → ∈ just before the processing of S1. Hence, a semantic action is now associated with this production and is executed at this point, provisioning the recall of the value of nextquad just before S1 code generates. Similarly, introducing a nullable nonterminal M2 before E will provide for the reduction by M2 → ∈ just before the processing of E. We then have a semantic action associated with this production and executed at this point, and which provisions the recall of the value of nextquad just before E code generates, as shown in Figure 7.

 S → do M1 S1 while M2 E {
                                backpatch (E.true, M1.quad)
                                backpatch (S1.next, M2.quad)
                                S.next = F.false
                        }
M1 → ∈{ M1.quad = nextquad; }
M2 → ∈{ M2.quad = nextquad; }

Figure 7: Translation of the Boolean do-while.

Hence, for a statement do x = y + z while a<b, the three-address code that is required to be generated is:

REPEAT-UNTIL

Since a repeat-until statement has two components, a Boolean expression E and an S1 statement that is executed repeatedly until E becomes true (as well as the test of whether E is true or false at the end of S1), the translation of repeat-until involves provisioning transfer of control to a test of the expression after the execution of S1 is over. We must also engineer a transfer a control to the start code of S1 if E is false and to the next statement if E is true.

This requires recalling the index in the code array where S1 code starts as well as the index in the code array where E code starts. We achieve this by introducing a nullable nonterminal M1 before S1 in the production. This will provide for the reduction by M₁ → ∈, just before the processing of S1. Hence, we can get a semantic action that is associated with this production and is executed at this point. This makes a provision for remembering the value of nextquad just before S code generates, and introduces a nullable non-terminal M2 before E. This will provide for the reduction by M₂ → ∈, just before the processing of E. Now we can get a semantic action associated with this production and executed at this point, and which provisions the recall of the value of nextquad just before E code generates, as shown in Figure 8.

 S → repeat M1 S1
          until M2 E    {
                          backpatch (E.false, M1.quad)
                          backpatch (S1.next, M2.quad)
                          S.next = E.true
                        }
M1 → ∈{ M1.quad = nextquad; }
M2 → ∈{ M2.quad = nextquad; }

Figure 8: Translation of Boolean repeat-until.

Hence, for the Boolean statement repeat x = y + z until a<b, the three-address code that is required to be generated is:

FOR

A for statement is composed of four components: an expression E1, which is used to initialize the iteration variable; an expression E2, which is a Boolean expression used to test whether or not the value of the iteration variable exceeds the final value; an expression E3, which is used to specify the step by which the value of the iteration variable is to be incremented or decremented; and an S1 statement, which is the statement to be executed as long as the value of the iteration variable is less than or equal to the final value. Hence, the translation of a for statement involves provisioning the transfer a control to the start of S1 code if E2 is true, transferring control to the start of E3 code after the execution of S1 is over, transferring control to the start of E2 code after E3 code is executed, and transferring control to the next statement if E2 is false, as shown in Figure 9.


 S → for (E1; M1 E2; M2 E3) M3 S1
                 {
                                 backpatch (E2.true, M3.quad)
                                 backpatch (M3.next, M1.quad)
                                 backpatch (S1.next, M2.quad)
                                 gencode (goto(M2.quad))
                                 S.next = E2.false
                 }
M1 → ∈{ M1.quad = nextquad; }
M2 → ∈{ M2.quad = nextquad; }
M3 → ∈            {
                    M3.next: = mklist (nextquad)
                    gencode (goto...)
                    M3.quad = nextquad;
                  }

Figure 9: Handling the translation of the Boolean for.

Hence, for a statement for(i = 1; i< = 20; i+ +) x = y + z, the three-address code that is required to be generated is:

Other

Algorithms for Compiler Design: THE ARRAY REFERENCE

Algorithms for Compiler Design: SWITCH/CASE

EXAMPLES for Syntax-Directed Definitions and Translations

What is a cross-compiler?

What is compilation?

Regular Expression Notation/Finite Automata Definitions

Relations in a cross-compiler

Non-Deterministic Finite Automata

Finite Automata and Regular Expressions

Compiler Design: Transforming NFA To DFA