Home | Downloads | Documentation | Plugins | Spinoff Projects | Mailing List

Tutorial 6: Returning Lists

In this tutorial we will develop step-by-step a transform that expands include statements. For example, if b.php is
<?php
   echo "Hello world";
?>

and a.php is

<?php
   include "b.php";
   echo "Goodbye!";
?>

Then running the transform on a.php yields

<?php
   echo "Hello world\n";
   echo "Goodbye\n";
?>

The transform we will develop in this tutorial is only a simple implementation of includes; a more full-featured transform is available using the option --compile-time-includes. The code for that transform is process_ast/Process_includes.cpp. The transform we will develop here is available as plugins/tutorials/Expand_includes.so.

Deleting Nodes

Our transform should process include statements. In the AST, includes are represented as method invocations. Thus, we might start like this:

class Expand_includes : public Tree_transform
{
public:
   AST_expr* pre_method_invocation(AST_method_invocation* in)
   {
      // Process includes
   }
};

However, this will not get us very far. The return type of pre_method_invocation is an AST_expr. That means that we can replace the method invocation (the include statement) only by another, single, expression. But we want to replace it by the contents of the specified file!

Recall from Tutorial 1 that to turn an expression into a statement, phc inserts an AST_eval_expr in the abstract syntax tree. Thus, if we want to process include statements, we could also look at all eval_expr nodes. Assuming for the moment we can make that work, does it get us any further? As a matter of fact, it does! If you check <phc/Tree_transform.h>, you will see that the signature for pre_eval_expr is

void pre_eval_expr(AST_eval_expr* in, AST_statement_list* out)

This is different from the signatures we have seen so far. For nodes that can be replaced by a number of new nodes, the pre transform and post transform methods will not have a return value in their signature, but have an extra AST_xxx_list argument. This list is initialised to be empty before pre_eval_expr is invoked, and when pre_eval_expr returns, the nodes in this list will replace *in. If the list is empty, the node is simply deleted from the tree.

So, we will use the following plugin as our starting point. Executing this plugin deletes all eval_expr nodes from the tree (try it!).

#include <phc/Tree_transform.h>

class Expand_includes : public Tree_transform
{
public:
   void pre_eval_expr(AST_eval_expr* in, AST_statement_list* out)
   {
   }
};

extern "C" void process_ast(AST_php_script* php_script)
{
   Expand_includes einc;
   php_script->transform(&einc);
}

Using the XML unparser

So, we now want to do something more useful than deleting all eval_expr nodes from the tree. The first thing we need to be able to do is distinguish include statements from other eval_expr nodes. We can use pattern matching (see tutorials 3 and 4) to do that - but what should we match against? If you are unsure about the structure of the tree, it can be quite useful to use the XML unparser to find out what the tree looks like. We modify the plugin as follows:

#include <phc/Tree_transform.h>
#include <phc/process_ast/XML_unparser.h>

class Expand_includes : public Tree_transform
{
private:
   XML_unparser xml_unparser;

public:
   void pre_eval_expr(AST_eval_expr* in, AST_statement_list* out)
   {
      in->visit(&xml_unparser);
   }
}

The XML unparser is implemented using the Tree_visitor API, so it can be invoked just like you run any other visitor. There is a similar visitor called PHP_unparser (in <phc/process_ast/PHP_unparser.h>) that you can use to print (parts of the) AST to PHP syntax.

When you run this transform on a.php, it will print two eval_expr nodes (shown in XML syntax), one for the include and one for the echo . We are interested in the first, the include: (we have removed the <attrs /> blocks to improve readability):

<AST_eval_expr>
   <AST_method_invocation>
      <Token_class_name>
         <value>%STDLIB%</value>
      </Token_class_name>
      <Token_method_name>
         <value>include</value>
      </Token_method_name>
      <AST_actual_parameter_list>
         <AST_actual_parameter>
            <bool>false</bool>
            <Token_string>
               <value>b.php</value>
               <source_rep>b.php</source_rep>
            </Token_string>
         </AST_actual_parameter>
      </AST_actual_parameter_list>
   </AST_method_invocation>
</AST_eval_expr>

This tells us that the include statement is an eval_expr node (that was obvious from the fact that we implemented pre_eval_expr). The eval_expr contains a method_invocation (we knew that too). The method invocation has target %STDLIB%, method name include, and a single parameter in the parameter list that contains the name of the file we are interested in. We can construct a pattern that matches this tree exactly:

class Expand_includes : public Tree_transform
{
public:
   void pre_eval_expr(AST_eval_expr* in, AST_statement_list* out)
   {
      // Pattern to match include statements   
      Token_string* filename; 
      AST_actual_parameter* param;
      AST_actual_parameter_list* params;
      Token_method_name* method_name;
      Token_class_name* target; 
      AST_method_invocation* pattern;
      
      filename = new Token_string(WILDCARD, WILDCARD);
      param = new AST_actual_parameter(false, filename);
      params = new AST_actual_parameter_list();
      params->push_back(param);
      method_name = new Token_method_name(new String("include"));
      target = new Token_class_name(new String("%STDLIB%"));
      pattern = new AST_method_invocation(target, method_name, params);

      // Check we have a matching function
      if(!in->expr->match(pattern))
      {
         // No match; leave untouched
         out->push_back(in);
      }
      else
      {
         // Process the include
      }
   }
};

Note how the construction of the pattern follows the structure of the tree as output by the XML unparser exactly. The only difference is that we leave the actual filename a wildcard; obviously, we want to be able to match against any include, not just include("a.php"). Running this transform should remove the include from the file, but leave the other statements untouched (note that we need to push_back in to out to make sure a statement does not get deleted).

The Full Transform

Remember from the previous tutorials that code defined outside the scope of any class and any function becomes part of %MAIN%::%run% in phc's internal representation. So, to expand the include, we need to parse the specified file, and replace the include by all the statements in %MAIN%::%run% in the parsed script (we should also deal with the other functions of %MAIN%, and with any other classes or interfaces in the included script; this is left as an exercise for the reader). Here then is the full transform:

#include <phc/Tree_transform.h>
#include <phc/parse.h>

class Expand_includes : public Tree_transform
{
public:
   void pre_eval_expr(AST_eval_expr* in, AST_statement_list* out)
   {
      // Pattern to match include statements 
      Token_string* filename; 
      AST_actual_parameter* param;
      AST_actual_parameter_list* params;
      Token_method_name* method_name;
      Token_class_name* target; 
      AST_method_invocation* pattern;
      
      filename = new Token_string(WILDCARD, WILDCARD);
      param = new AST_actual_parameter(false, filename);
      params = new AST_actual_parameter_list();
      params->push_back(param);
      method_name = new Token_method_name(new String("include"));
      target = new Token_class_name(new String("%STDLIB%"));
      pattern = new AST_method_invocation(target, method_name, params);

      // Check we have a matching function
      if(!in->expr->match(pattern))
      {
         // No match; leave untouched
         out->push_back(in);
      }
      else
      {
         // Try to open the file
         AST_php_script* php_script = parse(filename->value, NULL, false); 
         if(php_script == NULL)
         {
            cout << "Could not parse file " << *filename->value;
            cout << " on line " << in->get_line_number() << endl;
            exit(-1);
         }

         // Replace the include by all statements in %MAIN%::%run%
         AST_class_def* main = php_script->get_class_def("%MAIN%");
         AST_method* run = main->get_method("%run%");
         out->push_back_all(run->statements);
      }
   }
};

What's Next?

This is the last tutorial in this series on using the Tree_visitor and Tree_transform classes. Of course, there is no substitute for experimentation: if you really want to understand how things works, you should implement your own transforms. Hopefully, the tutorials will help you do so. The following sources should also be useful:

  • The grammar specification (and the definition of the grammar formalism)
  • The explanation of how PHP gets represented in the abstract syntax
  • The definition of the C++ classes for the AST nodes in <phc/ast.h>
  • The definition of the Tree_visitor and Tree_transform classes in <phc/Tree_visitor.h> and <phc/Tree_transform.h> respectively

And of course, we are more than happy to answer any other questions you might still have. Just send an email to the mailing list and we'll do our best to answer you as quickly as possible! Happy coding!

$LastChangedDate: 2006-09-08 12:24:58 +0100 (Fri, 08 Sep 2006) $. Contents © the authors.