Getting Started
For this introductory tutorial, we assume that you have
successfully downloaded and installed phc,
and that you know how to run it (see the Installation Instructions and Running phc). This tutorial gets you
started with using phc to develop your own
tools for PHP by writing plugins.
Compiling a Plugin
To get up and running, we'll first write a “hello
world” plugin that does nothing except print a string. Create a
new directory, say ~/myplugins and create a new file
helloworld.cpp:
#include <phc/ast.h>
#include <iostream>
using namespace std;
extern "C" void process_ast(AST_php_script* php_script)
{
cout << "Hello world (I'm a phc plugin!)" << endl;
}
This is an example of a minimal plugin. Every plugin you write
must contain a process_ast method with this exact
signature. To compile the plugin, run
~/myplugins$ phc_compile_plugin helloworld.cpp -o helloworld.so
(phc_compile_plugin is a small shellscript that makes
the task of compiling plugins easier; all it does is call
g++ with a couple of options; if you're curious, you can
open it in any text editor.) Finally, run the plugin using
~/myplugins$ phc --run helloworld.so sometest.php
(You need to pass in an input script to phc even
though our plugin does not use it.) If that worked as expected,
congratulations: you've just written your first phc plugin! :-)
About extern "C"
You may have been wondering what the extern "C" in
the definition of process_ast is for; the reason is that
phc uses the Unix dlopen
interface to load your plugin; if you do not declare
process_ast as extern "C", phc will not be able to find the
process_ast symbol in your plugin because the name of
that function will have been mangled by the C++ compiler.
Incidentally, this does not mean that you cannot write C++ code inside
process_ast.
If you don't understand any of that, don't worry about it: just
remember that you need to declare process_ast as
extern "C" and everything will be fine. (You don't need
extern "C" for any other functions you might define.)
The Abstract Syntax
To be able to do anything useful in your plugins, you need to know
how phc represents PHP code internally. phc's view of PHP scripts is described by an
abstract grammar. An abstract grammar describes how the
contents of a PHP script are structured. A grammar consists of a
number of rules. For example, there is a rule in the grammar that
describes how if statements work:
if ::= expr iftrue:statement* iffalse:statement* ;
This rules reads: “An if statement consists of
an expression (the condition of the if-statement), a list of
statements called `iftrue' (the instructions that get executed
when the condition holds), and another list of statements called
`iffalse' (the instructions that get executed when the condition
does not hold)”. The asterisk (*) in the rule means
“list of”.
As a second example, consider the rule that describes arrays in
PHP. This rule should cover things such as array(),
array("a", "b") and array(1 => "a", 2 =>
"g"). Arrays are described by the following two rules.
array ::= array_elem* ;
array_elem ::= key:expr? val:expr ;
(Actually, this is a simplification, but it will do for the
moment.) These two rules say that “an array consists of a
list of array elements”, and an “array element has
an optional expression called `key', and a second expression called
`val'”. The question mark (?) means
“optional”. Note that the grammar does not record the need
for the keyword array, or for the parentheses and commas.
We do not need to record these, because we already know that we
are talking about an array; all we need to know is what the array
elements are.
The Abstract Syntax Tree
When phc reads a PHP script, it builds up
an internal representation of the script. This representation is known
as an abstract syntax tree (or AST for short). The structure of
the AST follows directly from the abstract grammar. For people
familiar with XML, this tree can be compared to the DOM representation
of an XML script (and in fact, phc can output
the AST as an XML document).
For example, consider if-statements again. An
if-statement is represented by an instance of the
AST_if class, which is (approximately) defined as
follows.
class AST_if
{
public:
AST_expr* expr;
AST_statement_list* iftrue;
AST_statement_list* iffalse;
};
Thus, the name of the rule (if ::= ...) translates into
a class AST_if, and the elements on the right hand side of
the rule (expr iftrue:statement*
iffalse:statement*) correspond directly to the class
members. The class AST_statement_list inherits from the STL
list class, and can thus be treated as such.
Similarly, the class definitions for arrays and array elements look
like
class AST_array
{
public:
AST_array_elem_list* array_elems;
};
class AST_array_elem
{
public:
AST_expr* key;
AST_expr* val;
};
When you start developing applications with phc you will find it useful to consult the full
description of the grammar, which can be found in the Grammar Definition. A detailed explanation of
the structure of this grammar, and how it converts to the C++ class
structure, can be found in the Grammar
Formalism. Some notes on how phc converts
normal PHP code into abstract syntax can be found in Representing PHP.
Working with the AST
When you want to build tools based on phc,
you do not have to understand how the abstract syntax tree is built,
because this is done for you. Once the tree has been built, you can
examine or modify the tree in any way you want. When you are finished,
you can ask phc to output the tree to normal PHP
code again.
Let's write a very simple plugin that counts the number of class
definitions in a script. If you look at the grammar, you will notice that class definitions
are represented by a (C++) class called AST_class_def. So,
we need to count the number of objects of type AST_class_def
in the tree. Create a new file ~/myplugins/count_classes.cpp.
Recall the skeleton plugin:
#include <phc/ast.h>
extern "C" void process_ast(AST_php_script* php_script)
{
}
You will notice that process_ast gets passed an object of
type AST_php_script. This is the top-level node of the
generated AST. If you look at the grammar, you
will find that AST_php_script corresponds to the following
rule:
php_script ::= interface_def* class_def+ ;
Thus, as far as phc is concerned, a PHP
script consists of a number of interface definitions, followed by a
number of class definitions (see Representing PHP). The plus
(+) in this rule is similar to an asterisk (*),
but indicates that there must at least be one item in the list. In other
words, a PHP script may not have any interface definitions, but it must
have at least one class definition.
By now you should be able to deduce that the class
AST_php_script will have two members, called
interface_defs and class_defs, both of which
are lists. So, to count the number of classes, all we have to do is
query the number of elements in the class_defs vector:
#include <phc/ast.h>
extern "C" void process_ast(AST_php_script* php_script)
{
printf("%d class definition(s) found\n", php_script->class_defs->size());
}
Save this file to ~/myplugins/count_classes.cpp. Compile:
~/myplugins$ phc_compile_plugin -o count_classes.so count_classes.cpp
And run:
./phc --run count_classes.so hello.php
Actually...
If you actually did try to run your plugin, you might think right
now that something went wrong: phc appears to
report one class definition too many! However, there is a very good
reason for this. We said earlier that as far as phc is concerned, a PHP script consists of a number
of interface definitions, followed by at least one class definition. So
where does the code that is defined outside of any class go?
The answer is that any code defined outside any class goes into a
special class called %MAIN%. Any functions you define that
do not belong to any class, become members of %MAIN%, and
any code you write that does not belong to any function, becomes part of
a special method in %MAIN% called %run%. When
phc outputs the tree back to normal PHP code,
%MAIN% disappears; however, when you work with the tree,
there is no distinction between code defined inside and outside classes;
in the tree, everything is defined as part of some class. This makes the
tree simpler and easier to work with.
More details about how the various PHP constructs are represented in
the abstract grammar can be found in Representing PHP.
Writing Stand Alone Applications
If you prefer not to write a plugin but want to modify phc itself to derive a new, stand-alone,
application, you can modify process_ast in
process_ast/process_ast.cpp in the phc source tree instead. This has the effect of
“hardcoding” your plugin into phc
(in versions before 0.1.7, this was the only way to write extensions).
However, in the rest of the tutorials we will assume that you are
writing your extension as a plugin.
What's Next?
In theory, you now know enough to start implementing your own tools
for PHP. Write a new plugin, run the plugin using the
--run option, and optionally pass in the
--dump-php option also to request that phc outputs the tree back to PHP syntax after having
executed your plugin.
However, you will probably find that modifying the tree, despite
being well-defined and easy to understand, is actually rather
laborious. It requires a lot of boring boilerplate code. The good news
is that phc provides sophisticated support for
examining and modifying this tree. This is explained in detail in the
follow-up tutorials:
|