PHP is not Java

The fact that PHP is not Java is self-evident and does not need much emphasizing. Nevertheless, developers who come from a Java/OOP background sometimes treat PHP as if it were Java with dollar signs. For example, one finds not just generic design patterns such as singletons, factories, decorators, etc. in contemporary PHP applications, but also patterns directly taken from the JEE world, such as transfer objects, domain stores, filters, session facades, dispatchers. Even typical Java architectures involving separate business, service, persistence, presentation and integration tiers has been brought to the world of PHP application development. This trend began when PHP became a fully fledged OOP language with version 5. PHP developers seem to have looked to the Java world for inspiration ever since. Possibly it’s a new generation of graduates that was taught to think in Java, or maybe it’s just the overwhelming influence of the mainstream OOP body of thought. I can’t say for sure.

I like Java, but I don’t like PHP code that is written as if it were Java. PHP is great because it makes easy things easy. I believe that this simplicity is an asset, not a flaw. PHP provides a straightforward approach to web development, and I dare say that its success is founded on this principle. PHP gives you productivity, ease-of-use, a high level of abstraction, powerful well-tested libraries and backward compatibility. It’s a pragmatic language, not necessarily a beautiful one. PHP’s share-nothing approach may limit its use in specific areas such as concurrency and parallel processing, but it is ideally suited to the request life cycle model of the web, and it frees the programmer from a lot of potential headaches that come with concurrency. This share-nothing approach also implies that scripts are running in a virtual bubble. A malfunctioning or malicious script doesn’t bring down the entire virtual machine like on the Java platform. This property makes it perfect for shared hosting, and consequently, PHP hosting is ubiquitous and cheap.

PHP != Java

Now, let’s look at some Java-esque PHP code:

abstract class Observable {

  private $observers = array();

  public function addObserver(Observer & $observer) {
         array_push($this->observers, $observer);
  }

  public function notifyObservers() {
         for ($i = 0; $i < count($this->observers); $i++) {
                 $widget = $this->observers[$i];
                 $widget->update($this);
         }
     }
}

class DataSource extends Observable {

  private $names;
  private $prices;
  private $years;

  function __construct() {
         $this->names = array();
         $this->prices = array();
         $this->years = array();
  }

  public function addRecord($name, $price, $year) {
         array_push($this->names, $name);
         array_push($this->prices, $price);
         array_push($this->years, $year);
         $this->notifyObservers();
  }

  public function getData() {
         return array($this->names, $this->prices, $this->years);
  }
}

This is an attempt at the observer pattern taken straight from a popular PHP5 textbook. It is obvious that the implementation doesn’t do anything useful. Since the code is meant for illustration, this is forgiveable. The principal flaw is that in PHP, the observer pattern is for the birds. There are two reasons. First, the observer pattern is fundamentally a Java crutch that compensates for the lack of function arguments in Java. If you find yourself writing an observer in PHP, it might have escaped you that PHP supports function parameters and that you can implement the same functionality more concisely with a simple callback function. The second reason goes somewhat deeper into software design considerations. The observer pattern is typically used in an event-driven design, for example for maintaining the state of GUI widgets or in an event dispatcher for container objects. Such designs are rarely ever useful in PHP, because all PHP objects live in the request scope by default. This means that after the request cycle is completed, the objects -including infrastructure objects such as observers, are deallocated.

The fact that PHP objects don’t survive requests has another important implication: the amount of objects in an application is proportional to CPU consumption, because all objects have to be rebuilt at every request, and unlike in Java, there is no class loader that takes care of these things for you. Therefore, incorporating plenty of Java-esque OOP designs into PHP applications has an adverse impact on performance and scalability. Given that the performance of interpreted PHP is already a magnitude below that of JVM bytecode, you can quickly run into scalability issues with an application that serves a large user base. Putting objects into session scope is no solution, because PHP sessions are serialised/deserialised upon each request. The performance penalty of this operation is likely to be worse than explicit object construction.

The solution to the PHP scalability problem lies in using sensible caching, which is of course what every heavy traffic application does. Products such as memcache and APC provide in-memory data stores for objects. Caching, however, bumps up the complexity of persistence logic in your application considerably. It’s not a plug-and-play solution and it doesn’t provide access to shared data. On a personal note, if I was at the point where I had to consider PHP caching, I would probably already regret not having used a JVM based language in the first place. But let’s get back to the topic. Java developers are used to manipulate data with special purpose data structures called collections. PHP developers, on the other hand, generally use only one bread-and-butter data structure: the associative array. This data structure is an ordered map with either integer or string keys. It’s values can be of any type, and of course, arrays can contain mixed value types. Associative arrays can be used to mimic any collection type, such as lists, maps, queues, stacks, etc., even recursive structures such as trees, because array values can be arrays themselves. So there is no need for special purpose collections and this obviously makes things really simple. As a Java programmer, one should probably resist the temptation to define special purpose collections on top of associative arrays. Why? – It’s not the PHP way.

This statement may require some explanation. You might ask: what keeps you from inserting an element at the head or in the middle of a queue if you use associative arrays? Well, technically nothing. The issue is not about what can be done, but what should be done. It’s has to do with the PHP philosophy. PHP is NOT about enforcing programming by contract. It is a dynamic language, that treats data very loosely. This approach has its benefits and its dangers. If you don’t like it, it may be better to use a statically typed language. What I am saying here is, that it is generally wiser to leverage the strengths of a programming language to the greatest degree possible, rather than trying to compensate for its weaknesses. Dynamic languages have much to offer, but forcing a programming by contract approach onto PHP is painful. Consider the following example:

function recordVote($ballotId, $vote, DateTime $timestamp)
{
   if (!is_int($ballotId))
    throw new InvalidArgumentException(
      "ballotId must be an integer");
  if (!is_string($vote))
    throw new InvalidArgumentException(
      "vote must be a string");
  ...
  return $something;
}

This method is doing type checking the hard way. Version 5 of the language introduced a partial type checking syntax called type hinting. Unfortunately, this works only for objects, such as the $timestamp parameter in the above code snippet. The introduction of PHP type hinting did not only dissolve the dynamic paradigm of the PHP language, but it’s implementation is also pitifully inconsistent, because it does not provide for the checking of scalar types, such as int or string. In practice, it often the wrong scalar arguments that cause bugs which are difficult to detect. Wrong object arguments are detected easily in 90% of all cases when references to non-existing methods or properties are invoked. The recordVote() method tries to compensate for the former lack by explicitly checking the arguments using is_int() and is_string(). This produces explicit error messages at run time, but the approach is painful and entails some serious trade-offs.

Most importantly, the caller is now forced to pass specific types which require type casts in all places where the method is called. These casts are easily forgotten and therefore can cause bugs themselves. Did I mention they are also cumbersome to insert? Secondly, the method loses the capacity of implicit overloading, which means that its reusability is seriously curtailed. For example, if I wanted the method to accept timestamps in other formats, I would have to either add additional methods with different names and signatures, or create an adapter. Adding different methods for the same task results in code duplication. Creating an adapter makes the code harder to understand. Both solutions increase verbosity and reduce clarity. A more elegant and more PHP-like solution to this problem would be to replace syntactic checks by semantic checks, make the method signature more dynamic, which gives caller a richer yet still exact API with mixed type parameters:

function recordVote($ballotId, $vote, $timestamp)
{
  if (!is_numeric($ballotId))
    throw new InvalidArgumentException(
      "ballotId must be numeric");
  $ballotId = (int) $ballotId;
  if (empty($vote))
    throw new InvalidArgumentException(
      "vote must not be empty");
  $vote = (string) $vote;
  if ($timestamp instanceof DateTime)
    $timestamp = doSomething($timestamp);
  else if (is_string($timestamp))
    $timestamp = doSomethingElse($timestamp);
  else throw new InvalidArgumentException(
    "timestamp not valid");
  ...
  return $something;
}

Serve PHP with Tomcat

tomcat-php01.pngAs you can gather from the title of this website, I create software in Java, Scala, and PHP. While Java and Scala compile to bytecode that runs on the same virtual machine, PHP is executed by a separate interpreter. The most efficient way to run PHP scripts is to integrate the interpreter directly into the webserver. Hence, most PHP developers use a local Apache Httpd server with mod_php for development. If you also do Java programming, this raises the problem that you need two different web servers, namely Tomcat (or another web container or appserver) for Java development and Apache for PHP development. Running two servers is a bit of a nuisance. Two servers consume more resources than one and you cannot run both on the same port. This problem can be solved in three different ways: you can only run one server at a time, you can use a different port number for one server which has to be included in the URLs, or you can integrate the two servers. There are again at least three different ways to accomplish the latter: you can proxy requests from Apache to Tomcat, you can proxy request from Tomcat to Apache, or you can use a connector module, such as mod_jk. Of course, maintaining two servers is is more complicated than maintaining one, and the integration adds additional complexity. Fortunately, there is an easier way to integrate PHP and Java web applications. PHP/Java Bridge is a free open source product for the integration of the native PHP interpreter with the Java VM. It is designed with web applications in mind: Java servlets can “talk” to PHP scripts and vice versa. The official website describes it as an “implementation of a streaming, XML-based network protocol which is up to 50 times faster than local RPC via SOAP.” PHP/Java Bride requires no additional components to invoke Java procedures from PHP or vice versa. Although there are a number of different use cases, I am going to describe a particular one in this article, namely how to configure Tomcat with PHP/Java Bridge in order to have Tomcat serve PHP web pages. Let’s start with software requirements. We need the following software packages:

Follow the standard installation procedures for the JVM, Tomcat, and PHP. On Linux, you can use the standard packages for your distribution and on Windows you can use the regular installers. Make sure that both Tomcat and PHP are installed properly, which means that you should see Tomcat’s welcome web page at http://localhost:8080 and you should be able to execute a PHP script via the command line by invoking the standalone “php” command. The PHP/Java Bridge product does not use the regular executable, however, but fast CGI. The fast CGI executable is called php-cgi (or Php.cgi.exe on Windows), so you must make sure that your PHP installation contains it. Then you are all set to install and configure the PHP/Java Bridge. The PHP/Java Bridge package comes with a sample web application named JavaBridge.war. Deploy the application in Tomcat, point your browser to http://localhost:8080/JavaBridge and try out the examples. If this works, you are half-finished. To provide the capability to execute PHP scripts server-wide, not just in a single web application, you need to make some changes to the Tomcat configuration. Find the three jar files named JavaBridge.jar, php-servlet.jar and php-script.jar (look in WEB-INF/lib) and move them to Tomcat’s shared library directory. This is usually found in $CATALINA_HOME/lib (or $CATALINA_HOME/shared in older Tomcat installations). Then edit Tomcat’s conf/web.xml configuration file and add the following lines:

<listener-class>
    php.java.servlet.ContextLoaderListener
  </listener-class>
</listener>

<servlet>
  <servlet-name>PhpJavaServlet</servlet-name>
  <servlet-class>
    php.java.servlet.PhpJavaServlet
  </servlet-class>
</servlet>

<servlet>
  <servlet-name>PhpCGIServlet</servlet-name>
  <servlet-class>
    php.java.servlet.PhpCGIServlet
  </servlet-class>
  <init-param>
<param-name>prefer_system_php_exec</param-name>
<param-value>On</param-value>
  </init-param>
  <init-param>
<param-name>php_include_java</param-name>
<param-value>On</param-value>
  </init-param>
</servlet>

<servlet-mapping>
  <servlet-name>PhpJavaServlet</servlet-name>
  <url-pattern>*.phpjavabridge</url-pattern>
</servlet-mapping>

<servlet-mapping>
  <servlet-name>PhpCGIServlet</servlet-name>
  <url-pattern>*.php</url-pattern>
</servlet-mapping>

This adds the listeners and servlets required for PHP script execution to all web applications. While you are at it, you might also want to enable index.php files to display when a directory URL is requested. Simply add it to the list of welcome files in conf/web.xml. My list looks like this:

<welcome-file-list>
    <welcome-file>index.html</welcome-file>
    <welcome-file>index.htm</welcome-file>
    <welcome-file>index.jsp</welcome-file>
    <welcome-file>index.php</welcome-file>
</welcome-file-list>

Now you can copy PHP scripts into the context root directory of any web application and type the script URL into your browser. I suggest you try a script with phpinfo(). It gives you plenty of useful configuration info. If this doesn’t work and you are on Unix, the problem might be file permissions. On my machine, I had to copy the contents of “java” directory in the JavaBridge webapp manually to the context root directory where PHP applications were installed. This directory contains two files Java.inc and JavaProxy.php. Normally, the PHP/Java Bridge software copies it automatically, but it might not be able to do so if it does not have proper permissions:

~$ ls -lh /var/lib/tomcat6/webapps/ROOT/java
total 136K
-rw-rw-r-- 1 root root 64K 2009-12-30 14:14 Java.inc
-rw-rw-r-- 1 root root 64K 2009-12-30 14:14 JavaProxy.php

Now try calling a PHP script. For example, a script containing the phpinfo() command displays information about the server: tomcat-config.png I have configured my machine to host all PHP web applications in Tomcat’s ROOT context. This eliminates the extra path component of the webapp context, since the ROOT’s context path is “/”. Then I softlinked the folder that contains all my PHP projects into the ROOT webapp directory, so that the actual source files are kept separate from the Tomcat installation. In order to enable Tomcat to follow symlinks, you need to edit the context.xml of the respective web application -in this case ROOT- and add the line: <Context path=”/” allowLinking=”true” /> . Another possible gotcha is Tomcat’s security manager, which is enabled by default on Ubuntu, but not on Windows. Although a security manager is not necessary for most development scenarios, it is highly recommended for production. I consider it good practice to enable the security manager on the development machine, because it allows me to recognise security problems early during development, before the application is deployed on the production server. The downside is that additional configuration may be required, for PHP applications to function properly. The respective configuration files are located in $CATALINA_BASE/conf/policy.d. Most likely, you need to grant PHP web applications write access to files in the document root and possibly other permissions, such as opening sockets, etc. It’s probably safest to do this on a per-application basis.

Zend Framework Review

zend-framework.gifEarlier this week, I gave the latest version of the Zend Framework v-1.9.2 another test drive. I had previously dabbled in v-1.7.4 as well as a pre-1.0 incarnation of the framework. I will not repeat listing the whole breadth of its functionality here, since you can find this elsewhere on the Internet. Neither will I present a point-by-point analysis, just the salient points, short and sweet, which you can expect to be coloured by my personal view.

Suffice to say that the ZF (Zend Famework) is based on MVC -you’d never guessed- and it provides functionality for database access, authentication and access control, form processing, validation, I/O filtering, web services access, and a bunch of other things you would expect from a web framework. The first thing to notice is that the framework has grown up and I mean this quite literally from a few megabytes in its early days to a whopping 109 MB (unzipped) distribution package. Only about 21 MB are used by the framework itself; the rest contains demos, tests, and… the dojo toolkit… an old acquaintance, which is optional.

The documentation for the ZF was excellent right from the beginning and it has staid that way. Included is a 1170-pages PDF file, which also bears testimony to the growing size and complexity of the framework. Gone are the days when one could hack together a web application without reading a manual. One of the first things to realise is that ZF is glue-framework rather than a full-stack framework. This means, it feels more like a library or a toolkit. ZF does not prescribe architecture and programming idioms like many other web frameworks do. This appears to fit the PHP culture well, though it must be mentioned that most ZF idioms come highly recommended, since they represent best OO practices.

Another thing that catches the eye is the lack of an ORM component, which may likewise be rooted in traditional PHP culture. If you want object mapping, you would have to code around ZF’s DB abstraction and use Doctrine, Propel, or something similar. Let’s get started with this item.

Database Persistence
ZF provides a number of classes for DB abstraction. Zend_Db_Table implements a table data gateway using reflection and DB metadata. You only need to define table names and primary keys. Zend_Db_Adapter, Zend_Db_Statement and Zend_Db_Select provide database abstraction and let you create DB-independent queries and SQL statements in an object oriented manner. However, as you are dealing directly with the DB backend, all your data definitions go into the DB rather than into objects. Although this matches with the traditional PHP approach, it means that you need to create schemas by hand, which may irritate people who have been using ORM layers, like Hibernate, for years. On the other hand, a full-blown ORM layer likely incurs a significant performance cost in PHP, so maybe the ZF approach is sane.

Fat Controller
Like many other frameworks, ZF puts a lot of application logic into the controller, and this is my main gripe with the ZF. It seems to be the result of the idea that the “model” should concern itself only with shovelling data from the DB into the application and vice versa. A case in point is the coupling between Zend_Form and validation. This leaves you no option, but to put both into the controller. I think that data validation logically belongs to the model, while form generation logically belongs to the view. If you pull this into the middle, it will not only bulge the controller, but it is likely to lead to repetition of validation logic in the long run. That’s why I love slim controllers. Ideally, a controller should do nothing but filtering, URL rewriting, dispatching, and error processing.

MVC Implementation
Having mentioned coupling, it would do ZF injustice to say that things are tightly coupled. Actually, the opposite is the case, as even the MVC implementation is loosely coupled. At the heart you find the Zend_Controller_Front class which is set up to intercept all requests to dynamic content via URL rewriting. The rewriting mechanism also allows user-friendly and SEO-friendly URLs. The front controller dispatches to custom action controllers implemented via Zend_Controller_Action; if non-standard dispatching is required this can be achieved by implementing a custom router interface with special URL inference rules. The Zend_Controller_Action is aptly named, because that’s where the action is, i.e. where the application accesses the model and does its magic. The controller structure provides hooks and interfaces for the realisation of a plugin architecture.

Views
Views are *.phtml files that contain HTML interspersed with plenty of display code contained in the traditional <? ?> tags. It should be possible to edit *.phtml files with a standard HTML editor. The Zend_View class is a thin object from which View files pull display data. View fragments are stitched together with the traditional PHP require() or with layouts. It is also possible to use a 3rd party templating system. Given the <? ?>, there is little to prevent application logic from creeping into the view, except reminding developers that this is an abominable practice punishable by public ridicule.

Layouts
Layouts are a view abstraction. They enable you to arrange the logical structure of page layouts into neat and clean XML. These layouts are then transformed into suitable output (meaning HTML in most cases). As you can probably infer, this takes a second parsing step inside the PHP application, which is somewhat unfortunate, since PHP itself already parses view components. While layouts are optional, they are definitely nice to have. I think it’s probably the best a framework can do given the language limitations of PHP, which only understands the <?php> tag. If the XML capabilities of PHP itself would be extended to process namespaced tags like <php:something>, then one could easily create custom tags and the need for performance-eating 2-step processing would probably evaporate. Ah, wouldn’t it be nice?

Ajax Support
ZF does not include its own Javascript toolkit or set of widgets, but it comes bundled with Dojo and it offers JSON support. The Zend_Json class provides super-simple PHP object serialisation and deserialisation from/to JSON. It can also translate XML to JSON. The Zend_Dojo class provides an interface to the Dojo toolkit and makes Dojo’s widgets (called dijits) play nicely with Zend_Forms. Of course, you are free to use any other Ajax toolkit instead of Dojo, such as YUI, jQuery, or Prototype.

Flexibility
As mentioned, ZF is very flexible. It’s sort of loosely coupled at the design level, which is both a blessing and a curse. It’s a blessing, because it puts few restrictions on application architecture, and it’s a curse, because it creates gaps for code to fall through. A case in point is dependency injection ala Spring. In short, there isn’t much in the way of dependency management, apart from general OO practices of course. Nothing keeps programmers from having dependencies floating around in global space or in the registry. A slightly more rigid approach that enforces inversion of control when wiring together the Zend components would  probably not have hurt.

Overall Impression
My overall impression of the ZF is very good. It is a comprehensive and well-designed framework for PHP web applications. What I like best about it that it offers a 100% object-oriented API that looks very clean and makes extensive use of best OO practices, such as open/closed principle, programming to interfaces, composi
tion over inheritance, and standard design patterns. The API is easy to read and understand. The internals of its implementation likewise make a good impression. The code looks clean and well structured, which is quite a nice change from PHP legacy code. ZF still involves a non-trivial learning curve because of its size. I’ve only had time to look into the key aspects, and didn’t get around to try out more specialised features like Zend_Captcha, Zend_Gdata, Zend_Pdf, Zend_Soap, and web services, and all the other features that ZF offers to web developers. If I had to choose a framework for a new web application, ZF would definitely be among the top contenders.

Java vs. PHP vs. Scala

I have a bit of a dilemma with programming languages. Next year, I expect to be able to free up a little extra time for a private programming project (call me an optimist!) and I am wondering which language/technology to use. The project is quite straightforward. It's a business application that I use for my own work as a software engineer. It consists of four components. There's a contact manager component (or CRM as it's now fashionably called), a project management component, a time tracking component, and a billing component. That may sound like a tall order, but obviously I don't need the full-blown functionality of applications like Siebel, MS Project, or SAP. I just need an application that brings certain functionality together in a quite specific way to suit my needs.

The software I am currently using for this purpose consists of two different programs. The CRM and billing components are contained in a Delphi application which I wrote more than 10 years ago. The time sheet and project management components are part of a PHP application that I developed in 2002. Needless to say that these two programs are neither cutting-edge, nor are they well integrated. The Delphi application uses an outdated Borland Paradox DB  and the PHP application contains large swathes of ugly procedural code. Although the whole shebang fulfils its purpose, I feel it's high time for a replacement. Of course, I could acquire an existing software package and save a lot of time writing code myself. But hey, I am a software engineer. I do like a creative challenge and I want something that fits my needs. I also want to learn new technologies.

The question I am asking myself now is what to use for the task. I am considering Java, PHP, and Scala. There are pros and cons for each of these:

(1) Java, JSP and a web framework with an app server. This is the obvious choice. Most of my professional work is JEE-based these days. I believe that I can work productively with Java, although the language inevitably involves a lot of boilerplate code and redundancy, which has a negative impact on productivity. In spite of this, it would be an good opportunity to deepen my knowledge of JSF (Java Server Faces), Hibernate, or try out some other persistence layer. It would also offer an opportunity to learn a new Java web framework that I haven't yet worked with such as Spring or Tapestry. From a business point of view, this may be a good choice because Java technologies are in high demand and it is also a very robust platform. The JEE universe is really quite large and there's enough territory that would be fun to explore. The downside is that Java, the language, is slightly tedious.

(2) The second choice is PHP and the Zend framework in combination with some AJAX toolkit, such as YUI or Dojo. I have the feeling this would be the most productive way to go; the biggest bang for the buck so to speak. For a project of this size (around 50 kloc), the development time may be even half of that with Java. PHP 5 and the Zend framework are mature technologies and I am quite familiar with both. Another advantage of PHP is that it's wide spread. Almost every hosting company offers PHP, whereas the number of Java hosting companies is considerably smaller (and usually more expensive). So, there wouldn't be any problem hosting the finished product anywhere. The downside is that PHP, being a dynamic language, is less robust  and slower than JVM bytecode. The language is also less expressive. But the biggest disadvantage is that I'd hardly learn anything new in the process.

(3) The third alternative is using Scala in combination with the Lift framework and a standard web container. I find this the most exciting choice, but it's very likely to be the most time consuming. I am rather new to Scala and functional programming. What I have seen so far is great. Programming in Scala is much more fun than coding in Java or PHP. I am afraid though, it would take a bit of time to wrap my head around it and work productively. Scala is still a foreign language to me. Another downside is that there is a limited choice of frameworks, APIs, and tools available at this point. Actually, Lift is the only Scala web framework I know of. Another question I am asking myself is whether acquiring Scala skills does make any business sense. I haven't seen too many Scala job offerings so far. Seems like the most fun choice, but also the least promising from a business point of view. Decisions, decisions, decisions…