JSP Nightmares

jsp-nightmare.jpgOnce upon a time as a Java newbie, I thought that Java Server Pages were great. Back then I had done web development in Perl and PHP and I was pleasantly surprised by the similarity of the development process. It's as simple as write and run. No bothersome compiler runs and deployment cycles. Java Server Pages seemed like PHP on steroids. Actually, I still think this is a fairly accurate description. It's what the designers of JSP intended – Java's answer to dynamic web scripting languages. The question is whether the JSP design is sound. After having spent several years in the Java Enterprise world, and having maintained large amounts of JSP code, I am convinced that it isn't. So I am writing this article to tell you the untold nightmares of JSP programming.

Before we are going into that, let's briefly look at the bright side. JSP technology is tempting for several reasons. First, it is mainstream and it bears the official stamp of approval by Sun. Second, it is an integral part of JEE and widely supported by application servers and tools. Third, it it is relatively easy to whip up dynamic content with JSP. So, why not use Java Server Pages? Well, there are several problems, the biggest of which are scriptlets. Scriptlets are embedded Java code. Because JSP allows you to mix Java code with markup, you get hybrid files which tend to become messy and difficult to maintain. Few HTML designers understand what's going on in a JSP file. Few programmers are comfortable with mixing their code with reams of markup.

The typical production process for a JSP-based application looks like this: the UI designers finish their protoypes and hand over the resulting markup to the programmers. The programmers tear apart the HTML to make it easier to insert code, repeat  sequences, and insert display logic. Once the programmers are finished, there is absolutely no chance that the designers will recognise the markup they produced. It is therefore also unlikely that they will ever touch it again. Even if the finished product contains no scriptlets, JSTL, expression language and (God help us) custom tags will confuse the hell out of designers. This means that the entire process has to be repeated,  everytime a change of visuals is requested. This is however not the worst problem.

The biggest drawback is that there is absolutely no way to prevent application logic from creeping into the JSPs which ultimately leads to spaghetti code. You might think, “Oh well, I know how to code my JSPs properly.” At this point you should remember Murphy's law, especially the part that says: “if something can go wrong it will go wrong.” Perhaps you are a disciplined individual who wouldn't even dream about putting application logic into a JSP file. But can you say the same about the other programmers in your team? What if work needs to be completed under time pressure? Can you resist the temptation to solve a problem quick-and-dirty by putting a hack into a JSP file along with a “will fix this later” note? I have seen way too many “will fix this later” notes in JSPs and most of them were several years old.

My project consisted of an application with roughly 1.5 million LOC where about half of the code was implemented in JSPs. That's about 1500-2000 JSP files. Most files were less than 500 lines, but some were in excess of 2000 lines. Once you hit upon an XL-sized JSP, it's a safe bet that some crucial functionality is buried in it. Reading and understanding a 2000 line JSP can take several hours. Modifying and maintaining it is quite another deal. The project did not attempt to separate business logic from display logic in JSPs, thus XML was liberally mixed with scriptlets. While this approach works OK for narrow functionality that can be coded  in a single JSP, it becomes very unwieldy for functionality that spans a larger problem space and, hence, many JSPs.

The “solution” for this was to use scriptlet fragments which are shared by multiple JSPs. This isn't a real solution, however, because it forgoes almost all advantages of using an OOP language, such as abstraction, encapsulation, inheritance, and as a side effect it produces “ultra-tight coupling” between JSPs that use the same fragments. In theory, this can be made slightly less painful by defining inner classes in fragments for shared functionality. These class fragments are then included in the consuming JSPs at compile time. However, inner classes have their own limitations and compile time includes bulk up the resulting byte code. Since inner classes can access variables and methods in the outer scope, they lack proper encapsulation. Tying them to multiple consumers can lead to some weird dependencies and logical errors. Finally, there is no way to produce unit tests for scriptlets. The only way to test scriptlets is by inserting the test code directly into the JSP, which is obviously insane. In summary, there is no way to code scriptlets cleanly, so it's best to avoid them.

But even without scriptlets, there's plenty of trouble. For example, there is the Java standard tag library (JSTL) and the expression language (EL) which are supposed to replace scriptlets as coding instruments. In particular, the EL has been praised as enabling clean coding for MVC applications with Java Server Pages as view component. – Well, I disagree. – JSTL+EL are neither very clean nor very concise. What is worse, they are too powerful for their own good. JSTL+EL are Turing-complete, just like XSLT which they resemble, which means that replacing scriptlets with JSTL+EL is like jumping from the frying pan into the fire. In addition, JSTL provides tags that allow programmers to access a database and execute queries. If you see MVC going out of the window at this point, you have recognised the problem. In summary, JSTL+EL produce the same problems as scriptlets, but with an XML syntax.

In conclusion, Java Server Pages can lead to maintenance nightmares, especially when used in a large project. While it is possible to code JSPs cleanly, it is apparently not a widespread practice, which is probably Murphy's law taking its toll. Hence, if you are building a new Java web application, think twice about using JSP. If you have a legacy application, you might want to replace JSPs by something more appropriate. In most cases, this can be done gradually whereas legacy JSPs can run side by side with an alternative view technology.

Frameworkless Architecture

Perhaps suggesting to eschew web frameworks for web application development is playing the devil’s advocate. Perhaps it is even foolish. To renounce the productivity boost one gets with a properly designed framework does not sound like sensible advice. Only ignorant script kiddies entertain such ideas. Well, for the most part that is true. A web framework does indeed simplify application development if it is chosen well. It does even more if it is designed well. It can provide architectural support for building maintainable applications. It can help with the plumbing and provide conceptual structure to guide the development process.

So, what speaks against using a web framework? Plenty actually, especially at the lower end of the spectrum and especially with dynamic languages. The main problem with web frameworks is that they add overhead. This means that the added functionality and structure is bought at the cost of performance degradation. The graveness of this problem depends on the system architecture. One  needs to keep in mind, that dynamic languages are interpreted at runtime, which makes them CPU-intensive and relatively slow. Because the life cycle of a script is essentially stateless and single-step, classes and data structures need to be rebuild and reloaded (in theory) at each request.

In practice, this does not happen, because servers are designed to provide at least rudimentary caching. However, the runtime performance of interpreted languages is typically several magnitudes smaller than that of a compiled language, which magnifies the problem. To illustrate my point, consider these benchmarks for PHP frameworks kindly provided by Paul M. Jones. According to these figures, a trivial PHP page is served by Apache 2 at a performance reduction of 43% compared to static HTML. The use of various PHP web frameworks further reduces performance by 85% – 95% compared to a PHP page that merely echoes content. Although it can be expected that these figures develop inverse logarithmically with increasing application code complexity, the slowdown is significant.

PHP offers a number of remedies, such as  opcode caching, object caching, and products such as Zend Server, APC, and MCache, yet performance is unlikely to get even close to that of a compiled language. Furthermore, there is the question whether the complexity of the project justifies the complexity introduced by a web framework. Would you use a web framework for building a guestbook script? Probably not. What about a blog software? A photo gallery? A bulletin board? These types of applications are the mainstay of dynamic languages, such as PHP. It is the area where PHP really shines. Think of WordPress, phpBB, Mediawiki, Drupal, osCommerce, Coppermine and other popular applications. They all have one thing in common: they don’t use a framework.

Hence, before choosing a web framework for PHP development, it may be worth pondering if any is required. This suggestion may sound a bit contradictory, having just reviewed the Zend framework in a previous article. However, in my own practice I haven’t come across many complex PHP projects. The commercial PHP projects I worked on during the last 10 years can roughly be divided into three categories: 1. extensions and customisations of open source packages, 2. intranet information systems, and 3. e-commerce systems and “catalogware”.

Although the latter two may be considered candidates for web frameworks, the size of these projects was almost always small enough to do without. On several occasions, I chose to implement an “ultralight” MVC architecture by hand instead of using an out-of-the-box framework. The main reason for this was again performance. The “ultralight” approach is defined by implementing only the required functionality, which results in highly specialised design.

In practice, this means slimming the controller, reducing DB abstraction to a thin wrapper around the native library, and foregoing a templating system in favour of embedded PHP. The advantage of this approach is that you get separation of presentation and business logic, componentisation, and customisable control flow without the performance cost of full-blown framework. The disadvantage is that it is slightly more laborious to implement and less flexible. Don’t get me wrong. I have no problems imagining scenarios where I would want to use a PHP web framework such as the Zend framework. However, in these cases I’d probably be drawn towards using Java or (hopefully) Scala in the first place. In summary, I have found myself using PHP mostly in situations where a web framework seemed dispensable, while I have been using Java mostly in situations where a web framework seemed essential.

Zend Framework Review

zend-framework.gifEarlier this week, I gave the latest version of the Zend Framework v-1.9.2 another test drive. I had previously dabbled in v-1.7.4 as well as a pre-1.0 incarnation of the framework. I will not repeat listing the whole breadth of its functionality here, since you can find this elsewhere on the Internet. Neither will I present a point-by-point analysis, just the salient points, short and sweet, which you can expect to be coloured by my personal view.

Suffice to say that the ZF (Zend Famework) is based on MVC -you’d never guessed- and it provides functionality for database access, authentication and access control, form processing, validation, I/O filtering, web services access, and a bunch of other things you would expect from a web framework. The first thing to notice is that the framework has grown up and I mean this quite literally from a few megabytes in its early days to a whopping 109 MB (unzipped) distribution package. Only about 21 MB are used by the framework itself; the rest contains demos, tests, and… the dojo toolkit… an old acquaintance, which is optional.

The documentation for the ZF was excellent right from the beginning and it has staid that way. Included is a 1170-pages PDF file, which also bears testimony to the growing size and complexity of the framework. Gone are the days when one could hack together a web application without reading a manual. One of the first things to realise is that ZF is glue-framework rather than a full-stack framework. This means, it feels more like a library or a toolkit. ZF does not prescribe architecture and programming idioms like many other web frameworks do. This appears to fit the PHP culture well, though it must be mentioned that most ZF idioms come highly recommended, since they represent best OO practices.

Another thing that catches the eye is the lack of an ORM component, which may likewise be rooted in traditional PHP culture. If you want object mapping, you would have to code around ZF’s DB abstraction and use Doctrine, Propel, or something similar. Let’s get started with this item.

Database Persistence
ZF provides a number of classes for DB abstraction. Zend_Db_Table implements a table data gateway using reflection and DB metadata. You only need to define table names and primary keys. Zend_Db_Adapter, Zend_Db_Statement and Zend_Db_Select provide database abstraction and let you create DB-independent queries and SQL statements in an object oriented manner. However, as you are dealing directly with the DB backend, all your data definitions go into the DB rather than into objects. Although this matches with the traditional PHP approach, it means that you need to create schemas by hand, which may irritate people who have been using ORM layers, like Hibernate, for years. On the other hand, a full-blown ORM layer likely incurs a significant performance cost in PHP, so maybe the ZF approach is sane.

Fat Controller
Like many other frameworks, ZF puts a lot of application logic into the controller, and this is my main gripe with the ZF. It seems to be the result of the idea that the “model” should concern itself only with shovelling data from the DB into the application and vice versa. A case in point is the coupling between Zend_Form and validation. This leaves you no option, but to put both into the controller. I think that data validation logically belongs to the model, while form generation logically belongs to the view. If you pull this into the middle, it will not only bulge the controller, but it is likely to lead to repetition of validation logic in the long run. That’s why I love slim controllers. Ideally, a controller should do nothing but filtering, URL rewriting, dispatching, and error processing.

MVC Implementation
Having mentioned coupling, it would do ZF injustice to say that things are tightly coupled. Actually, the opposite is the case, as even the MVC implementation is loosely coupled. At the heart you find the Zend_Controller_Front class which is set up to intercept all requests to dynamic content via URL rewriting. The rewriting mechanism also allows user-friendly and SEO-friendly URLs. The front controller dispatches to custom action controllers implemented via Zend_Controller_Action; if non-standard dispatching is required this can be achieved by implementing a custom router interface with special URL inference rules. The Zend_Controller_Action is aptly named, because that’s where the action is, i.e. where the application accesses the model and does its magic. The controller structure provides hooks and interfaces for the realisation of a plugin architecture.

Views
Views are *.phtml files that contain HTML interspersed with plenty of display code contained in the traditional <? ?> tags. It should be possible to edit *.phtml files with a standard HTML editor. The Zend_View class is a thin object from which View files pull display data. View fragments are stitched together with the traditional PHP require() or with layouts. It is also possible to use a 3rd party templating system. Given the <? ?>, there is little to prevent application logic from creeping into the view, except reminding developers that this is an abominable practice punishable by public ridicule.

Layouts
Layouts are a view abstraction. They enable you to arrange the logical structure of page layouts into neat and clean XML. These layouts are then transformed into suitable output (meaning HTML in most cases). As you can probably infer, this takes a second parsing step inside the PHP application, which is somewhat unfortunate, since PHP itself already parses view components. While layouts are optional, they are definitely nice to have. I think it’s probably the best a framework can do given the language limitations of PHP, which only understands the <?php> tag. If the XML capabilities of PHP itself would be extended to process namespaced tags like <php:something>, then one could easily create custom tags and the need for performance-eating 2-step processing would probably evaporate. Ah, wouldn’t it be nice?

Ajax Support
ZF does not include its own Javascript toolkit or set of widgets, but it comes bundled with Dojo and it offers JSON support. The Zend_Json class provides super-simple PHP object serialisation and deserialisation from/to JSON. It can also translate XML to JSON. The Zend_Dojo class provides an interface to the Dojo toolkit and makes Dojo’s widgets (called dijits) play nicely with Zend_Forms. Of course, you are free to use any other Ajax toolkit instead of Dojo, such as YUI, jQuery, or Prototype.

Flexibility
As mentioned, ZF is very flexible. It’s sort of loosely coupled at the design level, which is both a blessing and a curse. It’s a blessing, because it puts few restrictions on application architecture, and it’s a curse, because it creates gaps for code to fall through. A case in point is dependency injection ala Spring. In short, there isn’t much in the way of dependency management, apart from general OO practices of course. Nothing keeps programmers from having dependencies floating around in global space or in the registry. A slightly more rigid approach that enforces inversion of control when wiring together the Zend components would  probably not have hurt.

Overall Impression
My overall impression of the ZF is very good. It is a comprehensive and well-designed framework for PHP web applications. What I like best about it that it offers a 100% object-oriented API that looks very clean and makes extensive use of best OO practices, such as open/closed principle, programming to interfaces, composi
tion over inheritance, and standard design patterns. The API is easy to read and understand. The internals of its implementation likewise make a good impression. The code looks clean and well structured, which is quite a nice change from PHP legacy code. ZF still involves a non-trivial learning curve because of its size. I’ve only had time to look into the key aspects, and didn’t get around to try out more specialised features like Zend_Captcha, Zend_Gdata, Zend_Pdf, Zend_Soap, and web services, and all the other features that ZF offers to web developers. If I had to choose a framework for a new web application, ZF would definitely be among the top contenders.