Home | Downloads | Documentation | Plugins | Spinoff Projects | Mailing List

Getting Started

For this introductory tutorial, we assume that you have successfully downloaded and installed phc, and that you know how to run it (see the Installation Instructions and Running phc). This tutorial gets you started with using phc to develop your own tools for PHP by writing plugins.

Compiling a Plugin

To get up and running, we'll first write a “hello world” plugin that does nothing except print a string. Create a new directory, say ~/myplugins and create a new file helloworld.cpp:

#include <phc/ast.h>
#include <iostream>

using namespace std;

extern "C" void process_ast(AST_php_script* php_script)
{
   cout << "Hello world (I'm a phc plugin!)" << endl;
}

This is an example of a minimal plugin. Every plugin you write must contain a process_ast method with this exact signature. To compile the plugin, run

~/myplugins$ phc_compile_plugin helloworld.cpp -o helloworld.so

(phc_compile_plugin is a small shellscript that makes the task of compiling plugins easier; all it does is call g++ with a couple of options; if you're curious, you can open it in any text editor.) Finally, run the plugin using

~/myplugins$ phc --run helloworld.so sometest.php

(You need to pass in an input script to phc even though our plugin does not use it.) If that worked as expected, congratulations: you've just written your first phc plugin! :-)

About extern "C"

You may have been wondering what the extern "C" in the definition of process_ast is for; the reason is that phc uses the Unix dlopen interface to load your plugin; if you do not declare process_ast as extern "C", phc will not be able to find the process_ast symbol in your plugin because the name of that function will have been mangled by the C++ compiler. Incidentally, this does not mean that you cannot write C++ code inside process_ast.

If you don't understand any of that, don't worry about it: just remember that you need to declare process_ast as extern "C" and everything will be fine. (You don't need extern "C" for any other functions you might define.)

The Abstract Syntax

To be able to do anything useful in your plugins, you need to know how phc represents PHP code internally. phc's view of PHP scripts is described by an abstract grammar. An abstract grammar describes how the contents of a PHP script are structured. A grammar consists of a number of rules. For example, there is a rule in the grammar that describes how if statements work:

if ::= expr iftrue:statement* iffalse:statement* ; 

This rules reads: “An if statement consists of an expression (the condition of the if-statement), a list of statements called `iftrue' (the instructions that get executed when the condition holds), and another list of statements called `iffalse' (the instructions that get executed when the condition does not hold)”. The asterisk (*) in the rule means “list of”.

As a second example, consider the rule that describes arrays in PHP. This rule should cover things such as array(), array("a", "b") and array(1 => "a", 2 => "g"). Arrays are described by the following two rules.

array ::= array_elem* ;
array_elem ::= key:expr? val:expr ;

(Actually, this is a simplification, but it will do for the moment.) These two rules say that “an array consists of a list of array elements”, and an “array element has an optional expression called `key', and a second expression called `val'”. The question mark (?) means “optional”. Note that the grammar does not record the need for the keyword array, or for the parentheses and commas. We do not need to record these, because we already know that we are talking about an array; all we need to know is what the array elements are.

The Abstract Syntax Tree

When phc reads a PHP script, it builds up an internal representation of the script. This representation is known as an abstract syntax tree (or AST for short). The structure of the AST follows directly from the abstract grammar. For people familiar with XML, this tree can be compared to the DOM representation of an XML script (and in fact, phc can output the AST as an XML document).

For example, consider if-statements again. An if-statement is represented by an instance of the AST_if class, which is (approximately) defined as follows.

class AST_if
{
public:
   AST_expr* expr;
   AST_statement_list* iftrue;
   AST_statement_list* iffalse;
};

Thus, the name of the rule (if ::= ...) translates into a class AST_if, and the elements on the right hand side of the rule (expr iftrue:statement* iffalse:statement*) correspond directly to the class members. The class AST_statement_list inherits from the STL list class, and can thus be treated as such.

Similarly, the class definitions for arrays and array elements look like

class AST_array
{
public:
   AST_array_elem_list* array_elems;
};

class AST_array_elem
{
public:
   AST_expr* key;
   AST_expr* val;
};

When you start developing applications with phc you will find it useful to consult the full description of the grammar, which can be found in the Grammar Definition. A detailed explanation of the structure of this grammar, and how it converts to the C++ class structure, can be found in the Grammar Formalism. Some notes on how phc converts normal PHP code into abstract syntax can be found in Representing PHP.

Working with the AST

When you want to build tools based on phc, you do not have to understand how the abstract syntax tree is built, because this is done for you. Once the tree has been built, you can examine or modify the tree in any way you want. When you are finished, you can ask phc to output the tree to normal PHP code again.

Let's write a very simple plugin that counts the number of class definitions in a script. If you look at the grammar, you will notice that class definitions are represented by a (C++) class called AST_class_def. So, we need to count the number of objects of type AST_class_def in the tree. Create a new file ~/myplugins/count_classes.cpp. Recall the skeleton plugin:

#include <phc/ast.h>

extern "C" void process_ast(AST_php_script* php_script)
{
}

You will notice that process_ast gets passed an object of type AST_php_script. This is the top-level node of the generated AST. If you look at the grammar, you will find that AST_php_script corresponds to the following rule:

php_script ::= interface_def* class_def+ ;

Thus, as far as phc is concerned, a PHP script consists of a number of interface definitions, followed by a number of class definitions (see Representing PHP). The plus (+) in this rule is similar to an asterisk (*), but indicates that there must at least be one item in the list. In other words, a PHP script may not have any interface definitions, but it must have at least one class definition.

By now you should be able to deduce that the class AST_php_script will have two members, called interface_defs and class_defs, both of which are lists. So, to count the number of classes, all we have to do is query the number of elements in the class_defs vector:

#include <phc/ast.h>

extern "C" void process_ast(AST_php_script* php_script)
{
   printf("%d class definition(s) found\n", php_script->class_defs->size());
}

Save this file to ~/myplugins/count_classes.cpp. Compile:

~/myplugins$ phc_compile_plugin -o count_classes.so count_classes.cpp

And run:

./phc --run count_classes.so hello.php

Actually...

If you actually did try to run your plugin, you might think right now that something went wrong: phc appears to report one class definition too many! However, there is a very good reason for this. We said earlier that as far as phc is concerned, a PHP script consists of a number of interface definitions, followed by at least one class definition. So where does the code that is defined outside of any class go?

The answer is that any code defined outside any class goes into a special class called %MAIN%. Any functions you define that do not belong to any class, become members of %MAIN%, and any code you write that does not belong to any function, becomes part of a special method in %MAIN% called %run%. When phc outputs the tree back to normal PHP code, %MAIN% disappears; however, when you work with the tree, there is no distinction between code defined inside and outside classes; in the tree, everything is defined as part of some class. This makes the tree simpler and easier to work with.

More details about how the various PHP constructs are represented in the abstract grammar can be found in Representing PHP.

Writing Stand Alone Applications

If you prefer not to write a plugin but want to modify phc itself to derive a new, stand-alone, application, you can modify process_ast in process_ast/process_ast.cpp in the phc source tree instead. This has the effect of “hardcoding” your plugin into phc (in versions before 0.1.7, this was the only way to write extensions). However, in the rest of the tutorials we will assume that you are writing your extension as a plugin.

What's Next?

In theory, you now know enough to start implementing your own tools for PHP. Write a new plugin, run the plugin using the --run option, and optionally pass in the --dump-php option also to request that phc outputs the tree back to PHP syntax after having executed your plugin.

However, you will probably find that modifying the tree, despite being well-defined and easy to understand, is actually rather laborious. It requires a lot of boring boilerplate code. The good news is that phc provides sophisticated support for examining and modifying this tree. This is explained in detail in the follow-up tutorials:

$LastChangedDate: 2006-09-08 12:24:58 +0100 (Fri, 08 Sep 2006) $. Contents © the authors.