A + Bis this DAG:
A → + ← BThe edges leaving operand A and B are outgoing. This is always the case since operands must "happen-before" the operator. Operators, on the other hand, can have both incoming and outgoing edges. In this example, the + operator's incoming edges are from its left and right operand. Operators in an expression with more than one operator will have both incoming and outgoing edges; e.g., one expression becomes an operand to another expression. For example, assuming + is left associative, the expression
A + B + Cis this DAG, where the leftmost + has both incoming and outgoing edges:
B
↓
A → + → + ← C
Since a DAG is acyclic it
can be drawn as a tree rooted in the last
operation performed, where
the arrows are implicitly directed up from the leaves towards the
root of the tree.
The
vertices of the graph are tree nodes and the graph edges are tree
branches directed from the leaves up to the root. Since the
root of the tree denotes the last operation performed,
an LVR (postorder) traversal of the tree is a valid
topological sort of the DAG and correctly represents the
semantics of the expression in terms of order of operations.
The above DAG is the tree shown below. A postorder traversal results
in (A + B ) + C:
+
/ \
+ C
/ \
A B
Since operands can have outgoing edges only, operands
will always be the leaves of the tree and operators the internal nodes. The
order of operations corresponds to the order in which
synthesized attributes of a syntax tree are decorated in bottom-up
LALR (Look-Ahead Left-to-right scan Rightmost derivation)
parsing.
The big advantage to using DAGs for expressions is that common subexpressions can be identified and combined. Take, for example, this expression
a + a * (b - c) + (b - c) * d
To demonstrate the technique of combining,
first combine the common sub-expression
(b - c) into a single sub-expression (e.g., a sub-DAG).
Do the same for common expression 'a'.
When finished, there should be 9 vertices:
6 operators + 5 operands - 2 common expressions.
The DAG below is the above expression shown in tree form,
where the root + is the last operation performed.
Note that the two * operators point to the same
sub-DAG. The + and * in the left subtree also point to the terminal 'a'
since both have 'a' as a left operand.
+
/ \
+ *
/ \ /\
\ * / d
\ / \/
a -
/ \
b c
A postorder traversal of the tree will
give you the same topological sort of the DAG as you had in the
original tree, thus verifying the validify of this approach:
a + a * (b - c) + (b - c) * d
The advantage to combining sub-DAGs is
that you will generate fewer computer instructions since you can 'reuse'
previous computations.
t_1 = y * z
t_2 = x + t_1
The use of names for intermediate values (t_1 and t_2) make three-address
code easy to rearrange. The above DAG for this expression
a + a * (b - c) + (b - c) * d
results
in the 3-address code shown below, generated by a topological
sort which is a depth-first left to right traversal of the abstract syntax
tree:
+
/ \
+ *
/ \ ) \
( * / d
\ / \/
a -
/ \
b c
The first sub-expression is b - c:
SORT Three-Address Code
d c b - * a * a + + t_1 = b - c
-----
+
/ \
+ *
/ \ / \
/ * / d
\ / \/
a t_1
d t_1 * a * a + + t_2 = t_1 * d
-------
+
/ \
+ t_2
/ \
/ *
\ / \
a t_2
t_2 a * a + + t_3 = t_2 * a
-------
+
/ \
+ t_2
/ \
a t_3
t_3 a + + t_4 = t_3 + a
+
/ \
t_4 t_2
t_4 +
(right opand for + is t_2) t_5 = t_4 + t_2
The final result is:
1. t1 = b - c
2. t2 = t1 * d
3. t3 = t2 * a
4. t4 = t3 + a
5. t5 = t4 + t2
1. x = y op z // binary ops
2. x = op y // unary ops
3. x = y // assignments
4. goto L // unconditional jumps
5. if x goto L // conditional jumps
6. if x relop y goto L // conditional jumps by relational ops
7. param x, call p,n
y = call p // procedure calls and returns
8. x = y[i] // support for non-scalars
9. x = &y, x = *y *x=y // support for pointers
Labels can be assigned symbolically (e.g. goto label) or by
a position (e.g. like original Fortran or Basic):
100: t_1 = i + 1
104: if t_3 < v goto 100
A 3-address code is stored in a structure (i.e., an object or a record)
that is attached to the node in the abstract syntax tree (AST)
pertaining to that code. For example, for the assignment statement
a = b * -c + b * c
the AST and the 3-address code are shown here:
=
/ \ t_1 = minus c
a + t_2 = b * t_1
/ \ t_3 = b * c
* * t_4 = t_2 + t_3
/ \ / \ a = t_4
b - b c
|
c
There are three ways to represent the 3-address code:
// Method 1. Quadruple Method
op arg_1 arg_2 result
0 - c t_1 // unary minus
1 * b t_1 t_2
2 * b c t_3
3 + t_2 t_3 t_4
4 = t_4 a
The disadvantage to quadruples is space - every operation requires
temporary storage space for a result. This problem
is solved by the triple method.
// Method 2. Triple Method
op arg_1 arg_2
0 - c
1 * b (0)
2 * b c
3 + (1) (2)
4 = a (3)
The disadvantage to triples is that an optimizing compiler cannot easily
rearrange instructions.
Method 3, the Indirect Method, provides the advantages of method #1 and method #2. In this method you add a list of pointers to triples. An optimizing compiler can move instructions by reordering the instruction list, without affecting the triples.
decl -> type ID; decl | e type -> INT | FLOAT
Storage Layout for Local Names.
Every static object (e.g., identifiers that are declared at compile time)
must be bound to an address by the compiler, which
means that the compiler must know the size of the object at compile time.
The storage layout for data objects
is heavily influenced by the hardware of the target machine.
The size of scalars (int, char, float, double, etc.)
and how data are aligned is controlled by the instruction set architecture
of target machine.
But the compiler front-end is designed to
be platform independent. So how does this work? The compiler front-end
will make decisions based on defaults that are stored in compiler header files.
See /usr/include/limits.h" and
/usr/include/sys/types.h" for GNU C defaults.
Gnu C also supports a number of command line options for changing the
size of pre-defined data types, setting alignment, and other
machine dependent options
(see gcc manpage).
The sizeof() subroutine in glib.c uses these defaults for user applications
(See gcc/c-common.c").
Type synthesis builds up the type of an expression from the types in subexpressions (an attribute grammer). C uses this. A type system that support widening coercions allows the compiler to implicit cast one type to another as long as information is not lost; e.g.
int i = 5;
char c = 'a';
c = i; // NOT OK
i = c; // OK
Java DesignType inference determines type by context. This is the design of JavaScript for the most part.