| Home | Downloads | Documentation | Plugins | Spinoff Projects | Mailing List |
maketea Theorymaketea is a tool bundled with phc which, based on a grammar definition of a language, generates a C++ hierarchy for the corresponding abstract syntax tree, a tree transformation and visitor API, and deep cloning, deep equality and pattern matching on the AST. In this document we describe the grammar formalism used by maketea, how a C++ class structure is derived from such a grammar, and explains how the tree transformation API is generated. The generated code itself is explained in another document. The Grammar FormalismThe style of grammar formalism used by maketea is sometimes referred to as an “object oriented” context free grammar. It facilitates a trivial and reliable mapping between the abstract grammar, and the actual (C++) abstract syntax tree (AST) that is generated by the phc parser. We make a distinction between three types of symbols: non-terminal
symbols, terminal symbols and markers. Non-terminal symbols have
the same function in our formalism as in the usual BNF formalism, and will not
be further explained. We denote non-terminal symbols in lower case in the
grammar (e.g., The distinction between terminal symbols and markers is non-standard.
Markers have no semantic value other than their presence; an example is
Each non-terminal symbol A terminal symbol Finally, a marker will not be represented by a specialised class. Instead,
a marker
There are only two types of rules in the grammar. The first is the simplest,
and list a number of alternatives for a non-terminal symbol a ::= b | c | ... | z Here, each of The second type is the most common: a ::= b c ... z In this rule, each of the As an example, consider the rule for expr ::= ... | variable | ... ; variable ::= target? variable_name array_indices:expr?* string_index:expr? ; A
class AST_variable : virtual public AST_expr
{
public:
AST_target* target;
AST_variable_name* variable_name;
AST_expr_list* array_indices;
AST_expr* string_index;
}
A final note on combining Context ResolutionWe also derive the tree visitor API and tree transformation API from the grammar. The tree visitor API is very simple to derive; see the overview of the generated code for an explanation. The tree transformation API however is slightly more difficult to derive. The problem is to decide the signatures for the transform methods, or in other words, what can transform into what? For example, in the phc grammar for PHP, the transform for an if-statement should be allowed return a list of statements of any kind (because it is safe to replace an if-statement by a list of statements). Similarly, a binary operator should be allowed return any other expression (but not a list of them). For reasons that will become clear very soon, we call the process of deciding these signatures “context resolution”. ContextsA context is essentially a use of a symbol somewhere in a (concrete) rule in the grammar. There are four possibilities. Consider: concrete1 ::= ... concrete2 ::= ... concrete3 ::= ... concrete4 ::= ... concrete5 ::= ... concrete6 ::= ... abstract1 ::= concrete3 | concrete4 abstract2 ::= concrete5 | concrete6 some_concrete_rule ::= concrete1 concrete2* abstract1 abstract2* then, based on the rule for some_concrete_rule, concrete1 occurs in the context (concrete1,concrete1,Single) - i.e., as a single instance of itself, concrete2 occurs in the context (concrete2,concrete2,List), i.e. as a list of instances of itself. The use of the abstract1 class leads to a number of contexts: (abstract1,abstract1,Single) (concrete3,abstract1,Single) (concrete4,abstract1,Single) And finally, the use of abstract2* yields to the contexts (abstract2,abstract2,List) (concrete5,abstract2,List) (concrete6,abstract2,List) These contexts essentially mean that an instance of concrete5 can be replaced by any number of any (concrete) instance of "abstract2". Reducing ContextsIf there are two or more conflicting contexts for a single symbol, we must resolve the contexts to their most specific (restrictive) form. For instance, for the phc grammar, this yields (if,statement,List) (CLASS_NAME,CLASS_NAME,Single) (INTERFACE_NAME,INTERFACE_NAME,Single) So, a context is a triplet (symbol,symbol,multiplicity), where the symbols
are terminal or non-terminal symbols, and the multiplicity is either Single,
Optional, List, OptionalList or ListOptional (list of optionals). When
reducing two contexts ( To see the reason for taking the meet, consider this fragment of the phc grammar: expr ::= ... | BOOL cast ::= CAST expr method_invocation ::= target ... target ::= expr | CLASS_NAME The use of "expr" in the rule for cast leads to the context (BOOL,expr,Single) The use of "target" in the rule for method_invocation leads to the context (BOOL,target,Single). By taking the meet of "expr" and "target", this gives the context (BOOL,expr,Single). This means that it is always safe to replace a boolean by any other expression (but it is not always safe to replace a boolean by any other target). In the case of CLASS_NAME, we have the contexts (CLASS_NAME,class_name,Single) (CLASS_NAME,target,Single) The meet of class_name and target does not exist; hence this gives the context (CLASS_NAME,CLASS_NAME,Single) That is, the only safe transformation for CLASS_NAME is from CLASS_NAME to CLASS_NAME. To be precise about the “most specific” multiplicity, here is a Haskell definition that returns the meet of two multiplicities: meet_mult :: Multiplicity -> Multiplicity -> Multiplicity meet_mult a b | a == b = a meet_mult Single _ = Single meet_mult List Optional = Single meet_mult List OptList = List meet_mult List ListOpt = List meet_mult Optional OptList = Single meet_mult Optional ListOpt = Optional meet_mult OptList ListOpt = List meet_mult a b = meet_mult b a -- meet is commutative Resolution for DisjunctionsWe cannot deal with this situation: s ::= a a ::= b | c d ::= b e ::= c* This grammar leads to the following contexts: (a,a,Single) (b,a,Single) (b,b,Single) (c,a,Single) (c,c,List) Resolving these contexts lead to (a,a,Single) (b,b,Single) (c,c,List) However, this is incorrect, because this indicates that an Otherwise, for a rule |
| $LastChangedDate: 2006-09-08 12:24:58 +0100 (Fri, 08 Sep 2006) $. Contents © the authors. |