Serve PHP with Tomcat

tomcat-php01.pngAs you can gather from the title of this website, I create software in Java, Scala, and PHP. While Java and Scala compile to bytecode that runs on the same virtual machine, PHP is executed by a separate interpreter. The most efficient way to run PHP scripts is to integrate the interpreter directly into the webserver. Hence, most PHP developers use a local Apache Httpd server with mod_php for development. If you also do Java programming, this raises the problem that you need two different web servers, namely Tomcat (or another web container or appserver) for Java development and Apache for PHP development. Running two servers is a bit of a nuisance. Two servers consume more resources than one and you cannot run both on the same port. This problem can be solved in three different ways: you can only run one server at a time, you can use a different port number for one server which has to be included in the URLs, or you can integrate the two servers. There are again at least three different ways to accomplish the latter: you can proxy requests from Apache to Tomcat, you can proxy request from Tomcat to Apache, or you can use a connector module, such as mod_jk. Of course, maintaining two servers is is more complicated than maintaining one, and the integration adds additional complexity. Fortunately, there is an easier way to integrate PHP and Java web applications. PHP/Java Bridge is a free open source product for the integration of the native PHP interpreter with the Java VM. It is designed with web applications in mind: Java servlets can “talk” to PHP scripts and vice versa. The official website describes it as an “implementation of a streaming, XML-based network protocol which is up to 50 times faster than local RPC via SOAP.” PHP/Java Bride requires no additional components to invoke Java procedures from PHP or vice versa. Although there are a number of different use cases, I am going to describe a particular one in this article, namely how to configure Tomcat with PHP/Java Bridge in order to have Tomcat serve PHP web pages. Let’s start with software requirements. We need the following software packages:

Follow the standard installation procedures for the JVM, Tomcat, and PHP. On Linux, you can use the standard packages for your distribution and on Windows you can use the regular installers. Make sure that both Tomcat and PHP are installed properly, which means that you should see Tomcat’s welcome web page at http://localhost:8080 and you should be able to execute a PHP script via the command line by invoking the standalone “php” command. The PHP/Java Bridge product does not use the regular executable, however, but fast CGI. The fast CGI executable is called php-cgi (or Php.cgi.exe on Windows), so you must make sure that your PHP installation contains it. Then you are all set to install and configure the PHP/Java Bridge. The PHP/Java Bridge package comes with a sample web application named JavaBridge.war. Deploy the application in Tomcat, point your browser to http://localhost:8080/JavaBridge and try out the examples. If this works, you are half-finished. To provide the capability to execute PHP scripts server-wide, not just in a single web application, you need to make some changes to the Tomcat configuration. Find the three jar files named JavaBridge.jar, php-servlet.jar and php-script.jar (look in WEB-INF/lib) and move them to Tomcat’s shared library directory. This is usually found in $CATALINA_HOME/lib (or $CATALINA_HOME/shared in older Tomcat installations). Then edit Tomcat’s conf/web.xml configuration file and add the following lines:

<listener-class>
    php.java.servlet.ContextLoaderListener
  </listener-class>
</listener>

<servlet>
  <servlet-name>PhpJavaServlet</servlet-name>
  <servlet-class>
    php.java.servlet.PhpJavaServlet
  </servlet-class>
</servlet>

<servlet>
  <servlet-name>PhpCGIServlet</servlet-name>
  <servlet-class>
    php.java.servlet.PhpCGIServlet
  </servlet-class>
  <init-param>
<param-name>prefer_system_php_exec</param-name>
<param-value>On</param-value>
  </init-param>
  <init-param>
<param-name>php_include_java</param-name>
<param-value>On</param-value>
  </init-param>
</servlet>

<servlet-mapping>
  <servlet-name>PhpJavaServlet</servlet-name>
  <url-pattern>*.phpjavabridge</url-pattern>
</servlet-mapping>

<servlet-mapping>
  <servlet-name>PhpCGIServlet</servlet-name>
  <url-pattern>*.php</url-pattern>
</servlet-mapping>

This adds the listeners and servlets required for PHP script execution to all web applications. While you are at it, you might also want to enable index.php files to display when a directory URL is requested. Simply add it to the list of welcome files in conf/web.xml. My list looks like this:

<welcome-file-list>
    <welcome-file>index.html</welcome-file>
    <welcome-file>index.htm</welcome-file>
    <welcome-file>index.jsp</welcome-file>
    <welcome-file>index.php</welcome-file>
</welcome-file-list>

Now you can copy PHP scripts into the context root directory of any web application and type the script URL into your browser. I suggest you try a script with phpinfo(). It gives you plenty of useful configuration info. If this doesn’t work and you are on Unix, the problem might be file permissions. On my machine, I had to copy the contents of “java” directory in the JavaBridge webapp manually to the context root directory where PHP applications were installed. This directory contains two files Java.inc and JavaProxy.php. Normally, the PHP/Java Bridge software copies it automatically, but it might not be able to do so if it does not have proper permissions:

~$ ls -lh /var/lib/tomcat6/webapps/ROOT/java
total 136K
-rw-rw-r-- 1 root root 64K 2009-12-30 14:14 Java.inc
-rw-rw-r-- 1 root root 64K 2009-12-30 14:14 JavaProxy.php

Now try calling a PHP script. For example, a script containing the phpinfo() command displays information about the server: tomcat-config.png I have configured my machine to host all PHP web applications in Tomcat’s ROOT context. This eliminates the extra path component of the webapp context, since the ROOT’s context path is “/”. Then I softlinked the folder that contains all my PHP projects into the ROOT webapp directory, so that the actual source files are kept separate from the Tomcat installation. In order to enable Tomcat to follow symlinks, you need to edit the context.xml of the respective web application -in this case ROOT- and add the line: <Context path=”/” allowLinking=”true” /> . Another possible gotcha is Tomcat’s security manager, which is enabled by default on Ubuntu, but not on Windows. Although a security manager is not necessary for most development scenarios, it is highly recommended for production. I consider it good practice to enable the security manager on the development machine, because it allows me to recognise security problems early during development, before the application is deployed on the production server. The downside is that additional configuration may be required, for PHP applications to function properly. The respective configuration files are located in $CATALINA_BASE/conf/policy.d. Most likely, you need to grant PHP web applications write access to files in the document root and possibly other permissions, such as opening sockets, etc. It’s probably safest to do this on a per-application basis.

JSP Nightmares

jsp-nightmare.jpgOnce upon a time as a Java newbie, I thought that Java Server Pages were great. Back then I had done web development in Perl and PHP and I was pleasantly surprised by the similarity of the development process. It's as simple as write and run. No bothersome compiler runs and deployment cycles. Java Server Pages seemed like PHP on steroids. Actually, I still think this is a fairly accurate description. It's what the designers of JSP intended – Java's answer to dynamic web scripting languages. The question is whether the JSP design is sound. After having spent several years in the Java Enterprise world, and having maintained large amounts of JSP code, I am convinced that it isn't. So I am writing this article to tell you the untold nightmares of JSP programming.

Before we are going into that, let's briefly look at the bright side. JSP technology is tempting for several reasons. First, it is mainstream and it bears the official stamp of approval by Sun. Second, it is an integral part of JEE and widely supported by application servers and tools. Third, it it is relatively easy to whip up dynamic content with JSP. So, why not use Java Server Pages? Well, there are several problems, the biggest of which are scriptlets. Scriptlets are embedded Java code. Because JSP allows you to mix Java code with markup, you get hybrid files which tend to become messy and difficult to maintain. Few HTML designers understand what's going on in a JSP file. Few programmers are comfortable with mixing their code with reams of markup.

The typical production process for a JSP-based application looks like this: the UI designers finish their protoypes and hand over the resulting markup to the programmers. The programmers tear apart the HTML to make it easier to insert code, repeat  sequences, and insert display logic. Once the programmers are finished, there is absolutely no chance that the designers will recognise the markup they produced. It is therefore also unlikely that they will ever touch it again. Even if the finished product contains no scriptlets, JSTL, expression language and (God help us) custom tags will confuse the hell out of designers. This means that the entire process has to be repeated,  everytime a change of visuals is requested. This is however not the worst problem.

The biggest drawback is that there is absolutely no way to prevent application logic from creeping into the JSPs which ultimately leads to spaghetti code. You might think, “Oh well, I know how to code my JSPs properly.” At this point you should remember Murphy's law, especially the part that says: “if something can go wrong it will go wrong.” Perhaps you are a disciplined individual who wouldn't even dream about putting application logic into a JSP file. But can you say the same about the other programmers in your team? What if work needs to be completed under time pressure? Can you resist the temptation to solve a problem quick-and-dirty by putting a hack into a JSP file along with a “will fix this later” note? I have seen way too many “will fix this later” notes in JSPs and most of them were several years old.

My project consisted of an application with roughly 1.5 million LOC where about half of the code was implemented in JSPs. That's about 1500-2000 JSP files. Most files were less than 500 lines, but some were in excess of 2000 lines. Once you hit upon an XL-sized JSP, it's a safe bet that some crucial functionality is buried in it. Reading and understanding a 2000 line JSP can take several hours. Modifying and maintaining it is quite another deal. The project did not attempt to separate business logic from display logic in JSPs, thus XML was liberally mixed with scriptlets. While this approach works OK for narrow functionality that can be coded  in a single JSP, it becomes very unwieldy for functionality that spans a larger problem space and, hence, many JSPs.

The “solution” for this was to use scriptlet fragments which are shared by multiple JSPs. This isn't a real solution, however, because it forgoes almost all advantages of using an OOP language, such as abstraction, encapsulation, inheritance, and as a side effect it produces “ultra-tight coupling” between JSPs that use the same fragments. In theory, this can be made slightly less painful by defining inner classes in fragments for shared functionality. These class fragments are then included in the consuming JSPs at compile time. However, inner classes have their own limitations and compile time includes bulk up the resulting byte code. Since inner classes can access variables and methods in the outer scope, they lack proper encapsulation. Tying them to multiple consumers can lead to some weird dependencies and logical errors. Finally, there is no way to produce unit tests for scriptlets. The only way to test scriptlets is by inserting the test code directly into the JSP, which is obviously insane. In summary, there is no way to code scriptlets cleanly, so it's best to avoid them.

But even without scriptlets, there's plenty of trouble. For example, there is the Java standard tag library (JSTL) and the expression language (EL) which are supposed to replace scriptlets as coding instruments. In particular, the EL has been praised as enabling clean coding for MVC applications with Java Server Pages as view component. – Well, I disagree. – JSTL+EL are neither very clean nor very concise. What is worse, they are too powerful for their own good. JSTL+EL are Turing-complete, just like XSLT which they resemble, which means that replacing scriptlets with JSTL+EL is like jumping from the frying pan into the fire. In addition, JSTL provides tags that allow programmers to access a database and execute queries. If you see MVC going out of the window at this point, you have recognised the problem. In summary, JSTL+EL produce the same problems as scriptlets, but with an XML syntax.

In conclusion, Java Server Pages can lead to maintenance nightmares, especially when used in a large project. While it is possible to code JSPs cleanly, it is apparently not a widespread practice, which is probably Murphy's law taking its toll. Hence, if you are building a new Java web application, think twice about using JSP. If you have a legacy application, you might want to replace JSPs by something more appropriate. In most cases, this can be done gradually whereas legacy JSPs can run side by side with an alternative view technology.

Performance: Java vs. PHP vs. Scala

The topic of performance of specific programming languages has provided material for endless debates among developers. Proponents of a certain language often cite examples that show their chosen language in a particularly good light. For example, I have come across the claim that PHP and Ruby are performing just as well as Java. Often these claims are supported by nothing but a firm tone of voice and a stern mien.

Thankfully, we have benchmarks. A playful yet credible approach to performance measurement was taken by this website. It contains an array of typical benchmark algorithms implemented in 35 popular (and not so popular) programming languages all running on various Linux servers. There are many complexities involved in measuring programming languages execution speed. The web development arena adds even more complexities. Much depends not only on how the algorithms are implemented, but also on how the compiler and the server are configured. For example, there are different JVMs, there are script caches, accelerators, server tunings, and what not. The languages I’d like to compare here are Java, Scala, and PHP. You will find comparison graphs for each of these below.

Three charts: 1. Java vs. PHP 2. Scala vs. PHP 3. Scala vs. Java Java, Scala, PHP Benchmarks

Three bar groups: 1. Execution time 2. Memory use 3. Size of source code Bar pointing down means: faster performance, lower memory usage, and reduced source size. Basically, these charts confirm what experienced programmers already know. Both, Java and Scala outperform PHP by several magnitudes. To be precise, Java and Scala execute 28 times faster on average.

The performance of Java and Scala is similar, which has to be expected since both languages run on the Java virtual machine. Java gives slightly better performance in some algorithms while its memory requirements are lower. For PHP fans who find these benchmarks disappointing, there is some consolation. PHP appears to be less of a memory hogger, and perhaps more importantly, it performs quite favorably when compared to other dynamic interpreted languages, such as Perl, and Ruby. You can compare more language benchmarks yourself at the following URL: http://shootout.alioth.debian.org/u64q/benchmark.php.

JSF – Productivity vs. Scalability

In my everlasting quest to find the perfect web framework (it’s futile, I know!) I spent a few more weekends, this time with Java Server Faces. I first got acquainted with JSF two years ago when I played around with Java Studio Creator and I remember being quite impressed by it. I even used it in a small project, which taught me a lot about the practical side of JSF. However, I did not get a deeper understanding until recently when I read Java Server Faces in Action by Kito D. Mann and Core Java Server Faces by David Geary and Cay Horstmann, both excellent books by the way.

I like RAD tools ever since I worked with Delphi more than a decade ago. In many ways, Delphi was the perfect solution for creating Windows GUI applications. It allowed me to create high-performance apps with great ease and tremendous productivity. When I first came across JSF, I realised it holds a similar promise for the field of web development. In my view, RAD is the most attractive way to create applications. It many not be the best choice for every type of application, but I am sure it is the best choice for a large percentage of run-of-the-mill commercial apps. The UI-centred approach of RAD tools makes prototyping very easy and it gives development projects a certain “users first” bent which I always found useful in project management.

When communicating with customers, a quick mock up screen always beats a bunch of UML diagrams. In addition, I  like the idea of high-level encapsulation offered by the visual component concept that RAD tools are known for. This high-level encapsulation, if properly implemented, should be able to be extended  to non-visual components, so that business logic might benefit from the same mechanism. Of course, in Java all you need are Java Beans (POJOs) for this purpose.

JFS is the attempt to apply a GUI-like component model to web applications. It has an extensible set of visual components, such as text boxes, list boxes, tabs, AJAX widgets, and data-aware widgets. Strictly speaking, the widgets themselves sit on top of JSF. JSF defines the component architecture and an application framework that includes things such as validators, converters, events, listeners and navigation rules. Like GUI widgets, JSF widgets are stateful. The JSF architecture makes this completely transparent to the application. This means, the application is oblivious of the stateless HTTP protocol, or whatever other protocol might be used. It doesn’t need to take care of preserving and/or restoring state. Sounds a bit like magic, doesn’t it? If you look into the internals of JFS, however, it becomes clear that this magic is created at a heavy cost. The first thing to look at is the 6-phase JSF request lifecycle:

  1. Restore View
  2. Apply Request Values
  3. Process Validations
  4. Update Model Values
  5. Invoke Application
  6. Render Response

Most of these phases do event processing. Quite a bit of additional processing compared to an average web app. But that is only part of the problem. The reasons for doubt lie in the first phase called “restore view”. It implies that the server maintains an internal representation of the view. When the view is restored, JSF creates a component tree in memory that represents the client view.

If you think, this might be a bit memory-intensive, you are probably right. Unfortunately, JSF doesn’t stop there. Component state is typically stored in what is called managed beans. A managed bean is merely an conceptual extension of a JSP backing bean. “Managed” means that the bean is declared in an XML file and that it is allocated automatically when needed by the JSF framework. However, it is not automatically deallocated, unless it is in page or request scope. Sounds like a memory eater? To me it does. Since managed beans are typically allocated in session scope, I imagine that a single client can quickly gobble up many Megabytes or more on the server.

Of course you need session data in any web app. In this regard, JSF isn’t different from other web apps. However, with dozens or hundreds of beans floating around in session memory and new ones being added without direct application intervention, I’d argue that the programmer loses fine-grained control over memory requirements. I can’t help asking: does JSF scale?

Many people think it doesn’t. One might even ask further, whether replicating client state on the server is scalable at all. Is it the right approach? I can’t answer the last question, but I’ll think twice about using JSF in future projects. Probably one doesn’t need to be concerned about applications that don’t go beyond the departmental level with a few dozen concurrent users. But what about a few hundred or a few thousand concurrent users? Even if you create an application for small scale use, it’s not an attractive choice to use a non-scalable architecture. Who knows how the application will eventually evolve and who will use it? As things stand now, JSF seems like a trade-off to me. You get RAD productivity at the expense of scalability. Or isn’t this the case? I think the burden of proof is with the JSF camp.

Java vs. PHP vs. Scala

I have a bit of a dilemma with programming languages. Next year, I expect to be able to free up a little extra time for a private programming project (call me an optimist!) and I am wondering which language/technology to use. The project is quite straightforward. It's a business application that I use for my own work as a software engineer. It consists of four components. There's a contact manager component (or CRM as it's now fashionably called), a project management component, a time tracking component, and a billing component. That may sound like a tall order, but obviously I don't need the full-blown functionality of applications like Siebel, MS Project, or SAP. I just need an application that brings certain functionality together in a quite specific way to suit my needs.

The software I am currently using for this purpose consists of two different programs. The CRM and billing components are contained in a Delphi application which I wrote more than 10 years ago. The time sheet and project management components are part of a PHP application that I developed in 2002. Needless to say that these two programs are neither cutting-edge, nor are they well integrated. The Delphi application uses an outdated Borland Paradox DB  and the PHP application contains large swathes of ugly procedural code. Although the whole shebang fulfils its purpose, I feel it's high time for a replacement. Of course, I could acquire an existing software package and save a lot of time writing code myself. But hey, I am a software engineer. I do like a creative challenge and I want something that fits my needs. I also want to learn new technologies.

The question I am asking myself now is what to use for the task. I am considering Java, PHP, and Scala. There are pros and cons for each of these:

(1) Java, JSP and a web framework with an app server. This is the obvious choice. Most of my professional work is JEE-based these days. I believe that I can work productively with Java, although the language inevitably involves a lot of boilerplate code and redundancy, which has a negative impact on productivity. In spite of this, it would be an good opportunity to deepen my knowledge of JSF (Java Server Faces), Hibernate, or try out some other persistence layer. It would also offer an opportunity to learn a new Java web framework that I haven't yet worked with such as Spring or Tapestry. From a business point of view, this may be a good choice because Java technologies are in high demand and it is also a very robust platform. The JEE universe is really quite large and there's enough territory that would be fun to explore. The downside is that Java, the language, is slightly tedious.

(2) The second choice is PHP and the Zend framework in combination with some AJAX toolkit, such as YUI or Dojo. I have the feeling this would be the most productive way to go; the biggest bang for the buck so to speak. For a project of this size (around 50 kloc), the development time may be even half of that with Java. PHP 5 and the Zend framework are mature technologies and I am quite familiar with both. Another advantage of PHP is that it's wide spread. Almost every hosting company offers PHP, whereas the number of Java hosting companies is considerably smaller (and usually more expensive). So, there wouldn't be any problem hosting the finished product anywhere. The downside is that PHP, being a dynamic language, is less robust  and slower than JVM bytecode. The language is also less expressive. But the biggest disadvantage is that I'd hardly learn anything new in the process.

(3) The third alternative is using Scala in combination with the Lift framework and a standard web container. I find this the most exciting choice, but it's very likely to be the most time consuming. I am rather new to Scala and functional programming. What I have seen so far is great. Programming in Scala is much more fun than coding in Java or PHP. I am afraid though, it would take a bit of time to wrap my head around it and work productively. Scala is still a foreign language to me. Another downside is that there is a limited choice of frameworks, APIs, and tools available at this point. Actually, Lift is the only Scala web framework I know of. Another question I am asking myself is whether acquiring Scala skills does make any business sense. I haven't seen too many Scala job offerings so far. Seems like the most fun choice, but also the least promising from a business point of view. Decisions, decisions, decisions…