Home | Downloads | Documentation | Plugins | Spinoff Projects | Mailing List

What's in Store?

Below are some of the features we are planning to add to phc, approximately in the order we'd like to implement them. Version 0.1.6 saw the last major improvement of the tree transformation and tree visitor APIs; the focus is now on implementing an IR (see first section, below).

Intermediate Representation (IR)

The first step towards actual compilation will be to translate PHP scripts into an low-level intermediate representation. This representation looks like machine language (or Java bytecode), but is actually machine independent. The purpose of the IR is to define a very simple language with as few constructs as possible, that exposes most of the details that are implicit in PHP scripts, and over which we can define all sorts of compiler optimizations.

To give you an idea, an if-statement such as

if($c == 1)
  $d = $c * 2 + 1;

might look like

   $r := $c.equals 1
   $r := $r.not
   if $r goto l1
   $t := $c.mult 2
   $d := ^$t.add 1
l1:

in the intermediate representation. We are currently investigating what the IR should look like exactly.

Generating Machine Code

Once we have defined an IR and are able to translate PHP scripts into this IR, we can start generating machine code. We will generate Intel x86 code for Linux, although it should not be too difficult to add support for other systems, too (esp. similar systems such as FreeBSD). We might not support all features of PHP at first, preferring to make phc useful for a (smaller) class of scripts as soon as possible.

Link to the PHP Standard Library

Of course, the power of PHP lies in the PHP standard library. Having finished code generation, we will make sure that PHP scripts compiled with phc can make use of the standard library as-is.

Compile Extensions

We are planning to make phc generate PHP extensions that can be loaded into the PHP interpreter. This means that you can write PHP extensions in PHP and use phc to compile them. This will be very useful for people that write extensions for the purpose of speed or protecting proprietary code.

Static Analyses and Code Optimization

Once all that is done, we can start the really interesting work: optimizing scripts. Many of these optimizations require analyses that are defined over the IR. It is our intention to make the results of these analyses available at the AST level as well, so that it is possible to define transformations that operate on PHP source (e.g., refactoring), but make use of analyses defined on the IR. Here are a couple of optimizations we are planning to implement:

Static Single Assignment (SSA) Form

SSA is a transformation on the IR that guarantees that every variable only gets assigned in one location in the program. This is a very useful optimization, because it makes many other optimizations much easier to define. Consider the following PHP script

<?php
   $x = foo();
   echo $x;

   $x = $x + bar();
   echo $x;
?>

Gets translated into

<?php
   $x0 = foo();
   echo $x0;

   $x1 = $x0 + bar();
   echo $x1;
?>

This makes the relation between the assignments to variables and the use of variables more obvious. This is useful for a number of things. For example, consider a refactoring to make sure that the result of dbx_connect always get stored in a variable called dbx_link. Then, the following code

<?php
   $x = dbx_connect(...);
   some_function($x);
   
   $x = some_other_function();
   some_function($x);
?>

should get translated to

<?php
   $dbx_link = dbx_connect(...);
   some_function($dbx_link);
   
   $x = some_other_function();
   some_function($x);
?>

In particular, the second some_function($x) should not be modified. Converting the program to SSA form first will make this easier.

Type Inference

Type inference tries to find out the type of each variable. We will do type inference on the SSA form of the program. This means that every variable can only have a single type (because it is only assigned once). Apart from compilation, type inference is useful for numerous other tasks. As a simple example, consider implementing a semantic checker (see Spinoff Projects). For example, in the following code,

$x = $x + $y;

if we can deduce that both $x and $y have type string, we might issue a warning that the plus (+) should probably be a dot (.) (PHP will warn for this at run-time if error_reporting is set high enough, but it is useful to catch these errors at compile time.)

For a more complicated example, consider renaming a class method. If we rename method foo to bar in class A (but not in class B) in the following example,

<?php
   class A
   {
      function foo 
      { 
         echo "Hello "; 
      }
   }

   class B
   {
      function foo 
      { 
         echo "world!"; 
      }
   }

   $a = new A();
   $b = new B();

   $a->foo();
   $b->foo();
?>

we should get

<?php
   class A
   {
      function bar 
      { 
         echo "Hello "; 
      }
   }

   class B
   {
      function foo 
      { 
         echo "world!"; 
      }
   }

   $a = new A();
   $b = new B();

   $a->bar();
   $b->foo();
?>

In particular, the line $b->foo(); should not be modified. We can only do this if we know that the type of $a is A, and the type of $b is B. Type inference cannot always be 100% accurate; in such a case, refactoring should fail (or insert code to explicitly distinguish between a few types).

So stay tuned :-)

$LastChangedDate: 2006-09-08 12:24:58 +0100 (Fri, 08 Sep 2006) $. Contents © the authors.