Home | Downloads | Documentation | Plugins | Spinoff Projects | Mailing List

Overview of the AST classes and transformation API

This document explains the code for the AST classes, tree visitor API and tree transformation API. All this code is generated by a tool called maketea. It does not explain how this code is derived from the phc grammar; some of the details of this process are explained in the maketea theory.

The AST classes

There are two main kinds of AST classes: classes that correspond to non-terminals in the grammar, and classes that correspond to terminals in the grammar. These two kinds are called AST_xxx and Token_xxx respectively. Examples are AST_while, AST_expr, Token_method_name and Token_int.

The main difference is that terminal classes have one additional field (and sometimes two). Every token class gets an attribute called value. The type of this attribute depends on the token; for most tokens it is String* (this is the default); however, if the grammar explicitely specifies a type for the value (in angular brackets, for example REAL<double>), this overrides the default. If the default value is overridden, the token class gets an additional attribute source_rep, which corresponds to the value of the token in the source value. The type of source_rep is always String*. For example, the real number 5E-1 might have value set to the double 0.5, but source_rep set to “5E-1”. Similarly, a string __FILE__ might have value set to /home/joe/myscript.php, but source_rep set to __FILE__. If the type of the value attribute it set to be empty, the token class does not get a value but (but it will get a source_rep field). This is the case for Token_null<> in the phc grammar.

In addition, all the tokens classes have a method called get_value_as_string() and a method get_source_rep when applicable. This is useful for programs that operate on general AST_identifier objects (such as Token_method_name or Token_class_name) or AST_literal (such as Token_real or Token_int). Note that the value returned by get_value_as_string() and get_source_rep() may be different; for example, get_source_rep() might return 0.5E-1, while get_value_as_string() might return 0.5.

All (non-terminal and terminal) then provide the following methods for deep equality, pattern matching, cloning, calling a tree visitor and calling a tree transformer. These methods are explained separately in sections below.

Deep Equality

Deep equality is implemented by bool deep_equals(Object* other). It takes into account the entire tree structure generated by maketea, including any fields that are specified in the mixin code in the grammar. Thus, deep_equals also compares line numbers, comments, etc.

Cloning

Cloning is implemented by deep_clone. Cloning makes a (deep) copy of a tree, so the set of all pointers in the new tree is completely distinct from the set of pointers in the old tree. The only exception to this rule is that cloning the WILDCARD objects (see pattern matching, below) returns the WILDCARD object itself.

Pattern Matching

Pattern matching is implemented by bool match(Object* pattern). Pattern matching differs from deep equality in two ways. First, it does not take into account any fields added by the mixin code; for example, it does not compare line numbers or comments.

Second, match supports the use of wildcards. maketea generates a special class called Wildcard. You should never instantiate this class directly; in <phc/ast.h>, you will find the following declaration:

extern Wildcard* WILDCARD;

This WILDCARD is the sole instance of Wildcard. When match encounters a reference to this object in a pattern, it does two things: it skips that field in the comparison (so it acts as a “don't care”), and it replaces the value of the field in the pattern by the value in the tree. For example, in the body of the if in

Token_class_name* name = new Token_class_name(new String("SOME_CLASS"));
Token_class_name* pattern = new Token_class_name(WILDCARD);

if(name->match(pattern))
{
   // ...
}

pattern->value will be set to the corresponding value in name. Tutorials 3 and 4 include examples of the use of wildcards.

Calling any methods on the WILDCARD object other than deep_clone will lead to a runtime error.

The Tree Visitor API

sequence diagram

Every AST class provides four methods to support the tree visitor API: void visit(Tree_visitor*), void pre_visit(Tree_visitor*), void visit_children(Tree_visitor*) and void post_visit(Tree_visitor*). The implementation of each of these methods is very simple.

visit simply calls pre_visit, visit_children and post_visit in order. It could have been implemented once and for all in the AST_node class (but is not, for no particular reason).

For a node x0, which inherits from x1, which inherits from x2, which in turn inherits from x3, etc., x0::pre_visit calls pre_x3, pre_x2, pre_x1 and pre_x0, in that order, on the tree visitor object, passing itself as an argument. If x0 inherits from multiple classes, all of the appropriate visitor methods will be invoked. However, if x0 inherits from both x1a and x1b, the programmer should not rely on the relative order of pre_x1a and pre_x1b.

x0::visit_children simply calls children_x0.

x0::post_visit will call post_x0, post_x1, etc. Again, if x0 inherits from both x1a and x1b, the programmer should not rely on the relative order of post_x1a and post_x1b. The only guarantee made by maketea is that the order of the pre-methods will be the exact reverse of the order of the post-methods.

The Tree Transform API

sequence diagram

Every AST class AST_foo, which inherits from AST_gen_foo provides four methods to support the tree visitor API: AST_gen_foo* transform(Tree_transformer*), AST_gen_foo* pre_transform(Tree_transformer*), void transform_children(Tree_transformer*) and AST_gen_foo* post_transform(Tree_transformer*). It is not entirely as straightforward as this; if AST_foo inherits from more than one class, the return type would probably be AST_foo; in some cases, transform might return a AST_foo_list instead. See the section on context resolution in the grammar formalism for details; here we consider the programmer's perspective only. The exact signatures for a particular class can always be found in generated/ast.h.

As with the tree visitor API, transform calls pre_transform, transform_children and post_transform. However, while transform calls pre_transform on itself, it calls transform_children and post_transform on the node returned by pre_transform. If pre_transform returns a vector, transform calls transform_children and post_transform on every element in that vector, assembling all the results.

pre_transform and post_transform simply call the appropriate method in the Tree_transform object. However, if pre_transform (or post_transform) returns a list of nodes, the corresponding method in the tree transform object will expect two arguments: the node to be transformed, and an empty list of nodes that will be the return value of pre_transform. In that case, pre_transform will first create a new empty list, pass that in as the second argument to the corresponding method in the tree transform object, and then return that list.

transform_children just calls the corresponding method in the tree transform object.

$LastChangedDate: 2006-09-08 12:24:58 +0100 (Fri, 08 Sep 2006) $. Contents © the authors.