Home | Downloads | Documentation | Plugins | Spinoff Projects | Mailing List

Tutorial 2: Modifying Tree Nodes

Now that we have seen in Tutorial 1 how to inspect the tree, in this tutorial we will look at modifying the tree. The task we set ourselves is: replace all calls to mysql_connect by calls to dbx_connect (dbx is a PECL extension to PHP that allows scripts interface with a database independent of the type of the database; this conversion could be part of a larger refactoring process that makes a script written for MySQL work with other databases.)

The tutorial we develop in this tutorial is available as MySQL2DBX.so in the phc distribution. To see its effect, run phc as follows:

phc --run plugins/tutorials/MySQL2DBX.so --dump-php test.php

First Attempt

As in the previous tutorial, we are interested in all function calls. However, now we are interested only in function calls to mysql_connect. Let us have a look at the precise definition of a function call according to the grammar:

method_invocation ::= target method_name params:actual_parameter* ;
method_name ::= METHOD_NAME | reflection ;
actual_parameter ::= is_ref:"&"? expr ;

(The target of a method invocation is the class or object the function gets invoked on. It is explained in Tutorial 3, and need not worry us here.) For now, we are only interested in the method_name. The grammar tells us that a method_name is either a METHOD_NAME or an expr. If a symbol is written in CAPITALS in the grammar, that means it refers to a “literal”. In this case, to an actual method name (such as mysql_connect). In PHP, it is also possible to call a method whose name is stored in variable; in this case, the function name will be a reflection node (which contains an expr). In this tutorial, we are interested in “normal” method invocations only.

Literals (or “terminal symbols”) do not get represented by a class called AST_something, but by a class called Token_something instead. All classes Token_something have an attribute called value which corresponds to the value of the token. For most tokens, the type of value is an STL String*. However, for some tokens, for example INT, value has a different type (e.g., int). If the token has a non-standard type, it will have an additional attribute called source_rep, which corresponds to the representation of the token in the source. For example, the real number 5E-1 would have value equal to the (double) 0.5, but source_rep equal to (the String*) “5E-1”.

Thus, we arrive at the following first attempt.

#include <phc/Tree_visitor.h>

class MySQL2DBX : public Tree_visitor
{
public:
   void post_method_invocation(AST_method_invocation* in)
   {
      Token_method_name* name;
	  
      name = new Token_method_name(new String("mysql_connect"));
      if(in->method_name->match(name))
      {
         in->method_name = new Token_method_name(new String("dbx_connect"));
      }
   }
};

extern "C" void process_ast(AST_php_script* php_script)
{
   MySQL2DBX m2d;
   php_script->visit(&m2d);
}

Note: phc uses a garbage collector, so there is never any need to free objects (you never have to call delete). This makes programming much easier and less error-prone (smaller chance of bugs).

match compares two (sub)trees for deep equality. There is also another function called deep_equals, which does nearly the same thing, but there are two important differences. match does not take comments, line numbers and other “additional” information into account, whereas deep_equals does. The second difference is that match supports wildcards; this will be explained in Tutorial 3.

Modifying the Parameters

Unfortunately, renaming mysql_connect to dbx_connect is not sufficient, because the parameters to the two functions differ. According to the PHP manual, the signatures for both functions are
mysql_connect (server, username, password, new_link, int client_flags)

and

dbx_connect (module, host, database, username, password, persistent)

The module parameter to dbx_connect should be set to DBX_MYSQL to connect to a MySQL database. Then host corresponds to server, and username and password have the same purpose too. So, we should insert DBX_MYSQL at the front of the list, and insert NULL in between host and username (the mysql_connect command does not select a database). The last two parameters to mysql_connect do not have an equivalent in dbx_connect, so if they are specified, we cannot perform the conversion. The last parameter to dbx_connect (persistent) is optional, and we will ignore it in this tutorial.

Now, in phc, DBX_MYSQL is an AST_constant. Because phc deals with everything as being classes, a constant must also be defined in a class. Constants defined in the PHP standard library, such as DBX_MYSQL, should be defined in the special “class” %STDLIB%. (For more information on how PHP gets converted to the abstract syntax tree, see Representing PHP.)

Finally, NULL is represented by Token_null.

We are now ready to write our conversion function:

#include <phc/Tree_visitor.h>

class MySQL2DBX : public Tree_visitor
{
public:
   void post_method_invocation(AST_method_invocation* in)
   {
      AST_actual_parameter_list::iterator pos;
      Token_constant_name* module_name;
      AST_constant* module_constant;
      AST_actual_parameter* param;
      Token_method_name* name;
	  
      name = new Token_method_name(new String("mysql_connect"));
      if(in->method_name->match(name))
      {
         // Check for too many parameters
         if(in->actual_parameters->size() > 3)
         {
            printf("Error: unable to translate call "
               "to mysql_connect on line %ld\n", in->get_line_number());
            return;
         }
      
         // Modify name
         in->method_name = new Token_method_name(new String("dbx_connect"));
      
         // Modify parameters
         module_name = new Token_constant_name(new String("DBX_MYSQL"));
         module_constant = new AST_constant("%STDLIB%", module_name);
         
         pos = in->actual_parameters->begin();
         param = new AST_actual_parameter(false, module_constant);
         in->actual_parameters->insert(pos, param); pos++;
         /* Skip host */ pos++;
         Token_null* null = new Token_null(new String("NULL"));
         param = new AST_actual_parameter(false, null);
         in->actual_parameters->insert(pos, param); 
      }
   }
};

extern "C" void process_ast(AST_php_script* php_script)
{
   MySQL2DBX m2d;
   php_script->visit(&m2d);
}

If we apply this transformation to

$link = mysql_connect('host', 'user', 'pass');

We get

$link = dbx_connect(DBX_MYSQL, "host", NULL, "user", "pass");

Refactoring

A quick note on refactoring. Refactoring is the process of modifying existing programs (PHP scripts), usually to work in new projects or in different setups (for example, with a different database engine). Manual refactoring is laborious and error-prone, so tool-support is a must. Although phc can be used to refactor PHP code as shown in this tutorial, a dedicated refactoring tool for PHP would be easier to use (though of course less flexible). Such a tool can however be built on top of phc. See also the list of Spinoff Projects.

What's Next?

Tutorial 3 explains how you can modify the structure of the tree, as well as the tree nodes.

$LastChangedDate: 2006-09-08 12:24:58 +0100 (Fri, 08 Sep 2006) $. Contents © the authors.