Restructuring the Tree

Now that we have seen in Traversing the Tree how we can traverse the tree, and in Modifying Tree Nodes how we can modify individual nodes in the tree, in this tutorial we will look at modifying the structure of the tree itself.

The transform that we will be considering in this tutorial is one that is used in phc itself. The transform is called Remove_concat_null and can be found in src/process_ast/Remove_concat_null.h. The purpose of the transform is to remove string concatenation with the empty string. For example,

<?php
   $s = "foo" . "";
?>

is translated to

<?php
   $s = "foo";
?>

The reason that this transform is implemented in phc is due to how the phc parser deals with in-string syntax. For example, if you write

$a = "foo $b bar";

the corresponding tree generated by phc is

$a = "foo " . $b . " bar";

In other words, the variables are pulled out of the string, and the various components are then concatenated together. However, taken to its logical conclusion, that means that if you write

$a = "foo $b";

the parser generates

$a = "foo " . $b . "";

Obviously, the second concatenation is unnecessary, and the Remove_concat_null transform cleans this up. In this tutorial we will explain how this transform can be written.

Introducing the Tree_transform API

Concatenation is a binary operator, so we are interested in nodes of type Bin_op. If you check the grammar, or, alternatively, src/generated/AST.h, you will find that Bin_op` has three attributes: a left and a right expression (of type Expr) and the operator itself (op of type OP). Thus, we are interested in nodes of type Bin_op whose op equals the single dot (for string concatenation).

Based on the previous two tutorials, we might try something like this:

class Remove_concat_null : public Visitor
{
public:
   void pre_bin_op(Bin_op* in)
   {
      // Find concat operators
      if (*in->op->value == ".")
      {
         // ...
      }
   }
}

The problem is, what are we going to do inside the if? Tree visitors can only inspect and modify *in; they cannot restructure the tree. In particular, we cannot replace *in by a new node. For this purpose, phc offers a separate API, the tree transformation API. It looks very similar to the tree visitor API, but there are two important differences. First, the pre and post methods can modify the structure of the tree by returning new nodes. Second, there are no “generic” methods in the tree transform API. So, it is not possible to define a transformation that would replace all statements by something else. (It is not clear how that would be useful, anyway.)

So, we need to write our transformation using the Tree_transform API, defined in AST_transform.h. Restructuring the class above yields

class Remove_concat_null : public Transform
{
public:
   Expr* pre_bin_op(Bin_op* in)
   {
      // Find concat operators
      if(*in->op->value == ".")
      {
         // ...
      }
   }
}

The differences between the previous version have been highlighted. We inherit from a different class, and pre_bin_op now has a return value, which is the node that will replace *in. If you check the default implementation of pre_bin_op in AST_transform.cpp, you’ll find:

Expr* Transform::pre_bin_op(Bin_op* in)
{
   return in;
}

The return in; is very important; as we mentioned before, the return value of pre_bin_op will replace *in in the tree. Therefore, if we don’t want to replace *in, or perhaps if we want to replace *in only if a particular condition holds, we must return in. This will replace *in by in itself.

The second thing to note is that the return type of pre_bin_op is Expr instead of Bin_op. This means that we can replace a binary operator node by another other expression node. The Maketea Theory explains exactly how the signatures for the pre and post methods are derived, but in most cases they are what you’d expect. The easiest way to check is to simply look them up in <AST_transform.h>.

The Implementation

We wanted to get rid of useless concatenation operators. To be precise, if the binary operator is the concatenation operator, and the left operand is the empty string, we want to replace the node by the right operand; similarly, if the right operand is the empty string, we want to replace the operator by its left operand. Here’s the full transform:

class Remove_concat_null : public Transform
{
public:
   Expr* post_bin_op(Bin_op* in)
   {
      STRING* empty = new STRING(new String(""));
      Wildcard<Expr>* wildcard = new Wildcard<Expr>;

      // Replace with right operand if left operand is the empty string
      if(in->match(new Bin_op(empty, wildcard, ".")))
         return wildcard->value;

      // Replace with left operand if right operand is the empty string
      if(in->match(new Bin_op(wildcard, empty, ".")))
         return wildcard->value;

      return in;
   }
}

We already explained what match does in Modifying Tree Nodes, but we have not yet explained the use of wildcards. If you are using a wildcard (WILDCARD) in a pattern passed to match, match will not take that subtree into account. Thus,

if(in->match(new Bin_op(empty, WILDCARD, ".")))

can be paraphrased as “is in a binary operator with the empty string as the left operand and "." as the operator (I don’t care about the right operand)?” If the match succeeded, you can find out which expression was matched by the wildcard by accessing wildcard->value.

Running Transformations

Recall from the previous two tutorials that visitors are run with a call to visit:

extern "C" void run_ast (PHP_script* in, Pass_manager* pm, String* option)
{
    SomeVisitor visitor;
    in->visit(&visitor);
}

Likewise, transformations are run with a call to transform_children:

extern "C" void run_ast (PHP_script* in, Pass_manager* pm, String* option)
{
    SomeTransform transform;
    in->transform_children(&transform);
}

We invoke transform_children because we should not replace the top-level node in the AST (the PHP_script node itself).

A Subtlety

If you don’t understand this section right now, don’t worry about it; you might find it useful to read it again after having gained some experience with the transformation API.

We have implemented the transform as a post-transform rather than a pre- transform. Why? Suppose we implemented the transform as a pre-transform. Consider the following PHP expression (bracketed explicitly for emphasis:)

("" . $a) . ""

The first binary operator we encounter is the second one (get phc to print the tree if you don’t see why.) So, we apply the transform and replace the operator by its left operand, which happens to be ("" . $a). We then continue and transform the children of the that node, because that is how the tree transform API is defined. But the children of that node are "" and $a. So, that means that the other binary operator itself will never be processed!

There are two solutions to this problem. The first is the one we used above, and use a post-transform instead of a pre-transform. You should try to reason out why this works, but a rule of thumb is that unless there is a good reason to use a pre-transform, it’s safer to use the post-transform, because in the post-transform the children of the node have already been transformed, so that you are looking at the “final” version of the node.

The second solution is to use a pre-transform, but explicitly tell phc to transform the new node in turn. This is the less elegant solution, but sometimes this is the only solution that will work (see for example the Token_conversion transform in the src/process_ast/Token_conversion.cpp). To do this, you would replace

return in->right;

by

return in->right->pre_transform(this);

What’s Next?

The next tutorial in this series, Using State, introduces a very important notion in transforms: the use of state.

Table Of Contents

Previous topic

Modifying Tree Nodes

Next topic

Using State

This Page