Algorithms for Compiler Design: THE ARRAY REFERENCE

7/24/2010 8:01:40 PM

6.9 THE ARRAY REFERENCE

An array reference is an expression with an l-value. Therefore, to capture its syntactic structure, we add the following productions to the grammar:

An array reference in a source program is replaced by the l-value of an expression that specifies the arrayreference to an element of the array. Computing the l-value involves finding the offset of the referred element of the array and then adding it to the base. But since deriving an offset depends on the subscripts used in an array reference, and the values of these subscripts are not known during the compilation, unless the subscripts are constant expressions, a compiler has to generate the code for evaluating the l-value of an expression that specifies the reference to an element of an array. This l-value computation is achieved as follows:

where lbi and ubi are the lower and upper bounds of the ith dimension.

If the lower bound of each dimension is one, and the upper bound of the ith dimension is di, then the offset computing formula becomes:

The [i1*d2*d3*…*dk + i2*d3*d4*…*dk +…+ ik]*bpw is a variable part of the offset computation, whereas [d2* d3*…*dk + d3*d4*…*dk +…+dk]*bpw is a constant part of the offset computation and is not required to be computed for every reference to an array a. It can be computed once while processing the declaration of the array a. We call this value "constant C". Therefore:

where V is the variable part, and

Since addr(a) is fixed, we can combine C with addr(a) and store this value in an attribute, L.place, and we can store V in another attribute, L.off, so that:

Hence, the translation of an array reference involves generating code for computing V, and V is made a value of attribute L.off. We compute addr(a) − C and make it the value of the attribute L.place. Computing V involves evaluating the expression:

This expression can be rewritten as:

Therefore, the three-address code that is required to be generated for computing V is:

Therefore, the translation scheme is:

elist →E           (Initialize queue by adding E.place)
elist → elist1, E  (Append E.place to queue)
L → id[elist]      { T1: = gentemp ( )
                                elist.Ndim = 1
                    gencode(T1 = retrieve();
                    while (queue not empty ) do
                    {
                               gencode (T1= T1 * limit (id.place, elist.Ndim))

                               gencode (T1 : = T1 + retrieve())
                               elist.Ndim = elist.Ndim + 1
                 }
                 V = gentemp();
                 U = gentemp();
                 gencode (V : = T1 * bpw)
                 gencode (U : = id.place - C)
                 L.off = V
                 L.place: = U
                 }

where retrieve() is a function that retrieves a value from the queue, and limit() returns the upper bound of the dimension of the array.

In this translation scheme, the attribute id.place cannot be accessed in the semantic action associated with the production elist → E or in the semantic action associated with the production elist → elist l, E. So it is not possible to make use of the value of the subscript that is available in E.place to get the required three-address statements generated. Hence, a queue is necessary in order to maintain the subscripts' storage. These subscripts are used later on for generating the code for computing the offset.

Another way to approach this is to modify the grammar to make it suitable for translation. This requires rewriting the productions in such a manner that both id and E exist in the same production so that the pointer to the symbol table record of the array name is available in id.place. This can be used to retrieve the upper-bound dimension information of the array. And the value of the subscript is available E.place; so by using both of these, the required three-address statements can be generated, and the value of the subscript does not need to be stored. Therefore, the modified grammar, along with the semantic actions, is:

L →elist  {           U = newtemp(); V = newtemp()
                      V = elist.place * bpw
                      U = gencode (elist.array - C)
                      L.place = U
                      L.off = V
          }
elist →id             E {elist.place = E.place
                      elist.array = id.place
                      elist.Ndim = 1; }
elist → elist, E         {     T1 = newtemp ();
                                   gencode (T1 = elist.place *
                                   limit (elist.array, elist.Ndim +1))
                                   gencode (T1 = T1 + E.place)
                                   elist.array = elist1.array
                                   elist.place = T1,
                                   elist.Ndim = elist.Ndim +1
                          }

For example, consider the following assignment statement:

where a and b are arrays of size 30 � 40, and c and d are arrays of size 20.

There are four bytes per word, and the arrays are allocated statically. When the above translation scheme is used to translate this construct, the three-address code generated is:

Other

Algorithms for Compiler Design: SWITCH/CASE

EXAMPLES for Syntax-Directed Definitions and Translations

What is a cross-compiler?

What is compilation?

Regular Expression Notation/Finite Automata Definitions

Relations in a cross-compiler

Non-Deterministic Finite Automata

Finite Automata and Regular Expressions

Compiler Design: Transforming NFA To DFA

Algorithms for Compiler Design: THE NFA WITH ∈-MOVES