Home | Downloads | Documentation | Plugins | Spinoff Projects | Mailing List

Tutorial 4: Using State

This tutorial explains an advanced feature of pattern matching, and shows an important technique in writing tree transforms: the use of state. Suppose we are continuing the refactoring tool that we began in Tutorial 2, and suppose that we have replaced all calls to database specific functions by calls to the generic DBX functions. To finish the refactoring, we want to rename any function foo in the script to foo_DB, if it makes use of the database — this clearly sets functions that use the database apart, which may make the structure of the script clearer.

So, we want to write a transform that renames all functions foo to foo_DB, if there is one or more call within that function to any dbx_something function. Here is a simple example:

<?php
   function first()
   {
      global $link;
      $error = dbx_error($link);
   }

   function second()
   {
      echo "Do something else";
   }
?>
After the transform, we should get
<?php
   function first_DB()
   {
      global $link;
      $error = dbx_error($link);
   }

   function second()
   {
      echo "Do something else";
   }
?>

The Implementation

Since we have to modify method (function) names, the nodes we are interested in are the nodes of type AST_method. However, how do we know when to modify a particular method? Should we search the method body for function calls to dbx_xxx? As we saw in Tutorial 1, manual searching through the tree is cumbersome; there must be a better solution.

The solution is in fact very easy. At the start of each method, we set a variable uses_dbx to false. When we process the method, we set uses_dbx to true when we find a function call to a DBX function. Then at the end of the method, we check uses_dbx; if it was set to true, we modify the name of the method. This tactic is implement by the following transform (available as plugins/tutorials/InsertDB.so in the phc distribution). Note the use of pre_method and post_method to initialise and check use_dbx, respectively. (Because we don't need to modify the structure of the tree in this transform, we use the simpler Tree_visitor API instead of the Tree_transform API.)

class InsertDB : public Tree_visitor
{
private:
   int uses_dbx;
   
public:
   void pre_method(AST_method* in)
   {
      uses_dbx = false;   
   }

   void post_method(AST_method* in)
   {
      if(uses_dbx)
         in->signature->method_name->value->append("_DB");
   }

   void post_method_invocation(AST_method_invocation* in)
   {
      Token_method_name* pattern = new Token_method_name(WILDCARD);
      
      // Check for dbx_
      if(in->method_name->match(pattern) && 
         pattern->value->find("dbx_") == 0)
      {
         uses_dbx = true;
      }
   }
};

In Tutorial 2, we simply wanted to check for a particular function name, and we used match to do this:

if(in->match(new Token_method_name("mysql_connect")))

Here, we need to check for method names that start with dbx_. We use the STL method find to do this, but we cannot call this directly on in->method_name because in->method_name has type AST_method_name (could either be a Token_method_name or a AST_reflection node). However, calling match on a pattern has the side effect of making the pattern equal to the tree we are matching on by replacing all wildcards with their corresponding value in the tree. So, after calling match in the transform, we can call find on the pattern (which has the right type) instead of directly on in->method_name. (match has this side effect only if the match succeeds.)

(Of course, this transform is not complete; renaming methods is not enough, we must also rename the corresponding method invocations. This is left as an exercise for the reader.)

What's Next?

Tutorial 5 explains how to change the order in which the children of a node are visited, avoid visiting some children, or how to execute a piece of code in between visiting two children.

$LastChangedDate: 2006-09-08 12:24:58 +0100 (Fri, 08 Sep 2006) $. Contents © the authors.