Home | Downloads | Documentation | Plugins | Spinoff Projects | Mailing List

Tutorial 1: Traversing the Tree

In Getting Started, we explained that phc represents PHP scripts internally as an abstract syntax tree, and that the structure of this tree is determined by the grammar. We then showed how to make use of this tree to count the number of classes. In this tutorial, we will consider an equally simple task: we want to count the number of function calls in a script. So, for the following PHP script,

<?php
   echo "Hello ";
   echo "world!";
?>

we should report two function calls.

Note that all the plugins that we will develop in these tutorials are included in the phc distribution. For example, in this tutorial we will be developing two plugins: a difficult solution to the problem and an easy solution to the problem. You can run these plugins by running

phc --run plugins/tutorials/count_function_calls_difficult.so hello.php
or
phc --run plugins/tutorials/count_function_calls_easy.so hello.php

The Grammar (Revisited)

How do we go about counting the number of function calls in a script? Remember that, as far as phc is concerned, a PHP script consists of a number of classes (and interface definitions). Each of these classes may have one or more methods, and each method can have one or more statements in them. Simplified, the grammar would state this as:

php_script ::= class_def+ ;
class_def ::= CLASS_NAME member* ; 
member ::= method | attribute ;
method ::= signature statement* ;
signature ::= METHOD_NAME formal_parameter* ; 

where the vertical bar (|) means “or”. Thus, our running example is represented by the following tree.

Simplified AST

(Note that this tree is simplified from the real tree; not all nodes are shown. You can also view the full tree).

Statements and Expressions

The two nodes that we are interested in are the “method invocation” nodes. The eval expr nodes just above them probably need some explanation. There are many different types of statements in PHP: if-statements, while-statements, for-loops, etc. You can find the full list in the grammar. If you do look at the grammar, you will notice in particular that a function call is not actually a statement! Instead, a function call is an expression.

The difference between statements and expressions is that a statement does something (for example, a for-loop repeats a bunch of other statements), but an expression has a value. For example, “5” is an expression (with value 5), “1+1” is an expression (with value 2), etc. A function call is also considered an expression. The value of a function call is the value that the function returns.

Now, the node eval expr makes a statement from an expression. So, if you want to use an expression where phc expects a statement, you have to use the grammar rule

statement ::= ... | eval_expr ;
eval_expr ::= expr ;

The Difficult Solution

The following plugin counts the number of function calls in a tree. If you do not understand the code, do not worry! We will look at a much easier solution in a second. If you understand the comments, that is enough.

#include <phc/ast.h>

extern "C" void process_ast(AST_php_script* php_script)
{
   AST_class_def_list::const_iterator ci;
   AST_member_list::const_iterator mi;
   AST_statement_list::const_iterator si;
   
   AST_method* method;
   AST_eval_expr* eval_expr;
   AST_method_invocation* invoc;
   
   int num_function_calls = 0;
   
   // Inspect all classes
   for(
      ci = php_script->class_defs->begin(); 
      ci != php_script->class_defs->end(); 
      ci++)
   {
      // Inspect all members in the class
      for(
         mi = (*ci)->members->begin(); 
         mi != (*ci)->members->end(); 
         mi++)
      {
         // Check whether this member is a method or an attribute
         method = dynamic_cast<AST_method*>(*mi);
         if(method == NULL) continue;
         
         // Check all statements in the method
         for(
            si = method->statements->begin(); 
            si != method->statements->end(); 
            si++)
         {
            // Check if the statement is of type "eval_expr"
            eval_expr = dynamic_cast<AST_eval_expr*>(*si);
            if(eval_expr == NULL) continue;
   
            // Finally, check if the expression is a function call
            invoc = dynamic_cast<AST_method_invocation*>(eval_expr->expr);
            if(invoc == NULL) continue;
   
            // Yeah! We found a function call
            num_function_calls++;
         }
      }
   }
   
   printf("%d function calls found\n", num_function_calls);
}

Why is this code so complicated? First of all, it has to search through the entire tree, looking for function calls. Because function calls are fairly deep down in the tree, we need a lot of code simply to find them. The second complication is the fact that, for example, a class consists of “members“. A member is either an attribute or a method, but we are interested only in methods. Similarly, a method consists of “statements“. A statement can be one of many things; but we are only interested in eval_expr statements. Thus, we have to test nodes for their type (using dynamic_cast).

The Easy Solution

Fortunately, phc will do all this for you automatically! There is a standard “do-nothing” tree traversal predefined in phc in the form of a class called Tree_visitor (defined in <phc/Tree_visitor.h>). Tree_visitor contains methods for each type of node in the tree. phc will automatically traverse the abstract syntax tree for you, and call the appropriate method at each node.

In fact, there are two methods defined for each type of node. The first method, called pre_something, gets called on a node before phc visits the children of the node. The second method, called post_something, gets called on a node after phc has visited the children of the node. For example, pre_method gets called on an AST_method, before visiting the statements in the method. After all statements have been visited, post_method gets called. Thus, the very first method that gets called is pre_php_script (because that is the top-level node in the tree), and the very last method that gets called is post_php_script.

So, here is an alternative and much easier solution for our problem.
#include <phc/Tree_visitor.h>

class Count_function_calls : public Tree_visitor
{
private:
   int num_function_calls;

public:
   // Set num_function_calls to zero before we begin
   void pre_php_script(AST_php_script* in)
   {
      num_function_calls = 0;
   }

   // Print the number of function calls when we are done
   void post_php_script(AST_php_script* in)
   {
      printf("%d function calls found\n", num_function_calls);
   }
   
   // Count the number of function calls
   void post_method_invocation(AST_method_invocation* in)
   {
      num_function_calls++;
   }
};

extern "C" void process_ast(AST_php_script* php_script)
{
   Count_function_calls cfc;
   php_script->visit(&cfc);
}

The real work in this transform is now done by the visitor; the only task left that process_ast still has to do is instanstiate the visitor and run it over the tree.

Counting All Statements

(The plugin explained in this section is available as plugins/tutorials/count_statements.so in the phc distribution.)

Suppose we wanted to count all statements, rather than just function calls. We could define methods for eval_expr, for, while, if, etc., but there is an easier way. If you check the grammar, you will find the following rules:

statement ::= if | ... ;
commented_node ::= ... | statement | ... ;
node ::= ... | node | ... ;

You could read this as “an if-statement is-a statement, a statement is-a commented node (a node that may have comments associated with it), and a commented node is-a node. This is-a relation is reflection using inheritance (class AST_if inherits from AST_statement), but the tree visitor API also has corresponding “generic” methods. For example, the following suffices to count all statements:

void pre_statement(AST_statement* in)
{
   num_statements++;
}

We need to be precise about the order in which phc calls all these methods. Suppose we have a node Foo (say, an if-statement), which is-a Bar (say, statement), which itself is-a Baz (say, commented node). Then phc calls the visitor methods in the following order:

  1. pre_baz
  2. pre_bar
  3. pre_foo
  4. children_foo (visit the children of foo)
  5. post_foo
  6. post_bar
  7. post_baz

Just to emphasise, if all of the visitor methods listed above are implemented, they will all be invoked, in the order listed above. So, implementing a more specific visitor (pre_foo) does not inhibit the more general method (pre_bar) from being invoked. You can run the plugins/tutorials/show_traversal_order.so from the phc distribution to see this in action.

What's Next?

To find out how you can modify the tree, continue with Tutorial 2.

$LastChangedDate: 2006-09-08 12:24:58 +0100 (Fri, 08 Sep 2006) $. Contents © the authors.