Frameworkless Architecture

Perhaps suggesting to eschew web frameworks for web application development is playing the devil’s advocate. Perhaps it is even foolish. To renounce the productivity boost one gets with a properly designed framework does not sound like sensible advice. Only ignorant script kiddies entertain such ideas. Well, for the most part that is true. A web framework does indeed simplify application development if it is chosen well. It does even more if it is designed well. It can provide architectural support for building maintainable applications. It can help with the plumbing and provide conceptual structure to guide the development process.

So, what speaks against using a web framework? Plenty actually, especially at the lower end of the spectrum and especially with dynamic languages. The main problem with web frameworks is that they add overhead. This means that the added functionality and structure is bought at the cost of performance degradation. The graveness of this problem depends on the system architecture. One  needs to keep in mind, that dynamic languages are interpreted at runtime, which makes them CPU-intensive and relatively slow. Because the life cycle of a script is essentially stateless and single-step, classes and data structures need to be rebuild and reloaded (in theory) at each request.

In practice, this does not happen, because servers are designed to provide at least rudimentary caching. However, the runtime performance of interpreted languages is typically several magnitudes smaller than that of a compiled language, which magnifies the problem. To illustrate my point, consider these benchmarks for PHP frameworks kindly provided by Paul M. Jones. According to these figures, a trivial PHP page is served by Apache 2 at a performance reduction of 43% compared to static HTML. The use of various PHP web frameworks further reduces performance by 85% – 95% compared to a PHP page that merely echoes content. Although it can be expected that these figures develop inverse logarithmically with increasing application code complexity, the slowdown is significant.

PHP offers a number of remedies, such as  opcode caching, object caching, and products such as Zend Server, APC, and MCache, yet performance is unlikely to get even close to that of a compiled language. Furthermore, there is the question whether the complexity of the project justifies the complexity introduced by a web framework. Would you use a web framework for building a guestbook script? Probably not. What about a blog software? A photo gallery? A bulletin board? These types of applications are the mainstay of dynamic languages, such as PHP. It is the area where PHP really shines. Think of WordPress, phpBB, Mediawiki, Drupal, osCommerce, Coppermine and other popular applications. They all have one thing in common: they don’t use a framework.

Hence, before choosing a web framework for PHP development, it may be worth pondering if any is required. This suggestion may sound a bit contradictory, having just reviewed the Zend framework in a previous article. However, in my own practice I haven’t come across many complex PHP projects. The commercial PHP projects I worked on during the last 10 years can roughly be divided into three categories: 1. extensions and customisations of open source packages, 2. intranet information systems, and 3. e-commerce systems and “catalogware”.

Although the latter two may be considered candidates for web frameworks, the size of these projects was almost always small enough to do without. On several occasions, I chose to implement an “ultralight” MVC architecture by hand instead of using an out-of-the-box framework. The main reason for this was again performance. The “ultralight” approach is defined by implementing only the required functionality, which results in highly specialised design.

In practice, this means slimming the controller, reducing DB abstraction to a thin wrapper around the native library, and foregoing a templating system in favour of embedded PHP. The advantage of this approach is that you get separation of presentation and business logic, componentisation, and customisable control flow without the performance cost of full-blown framework. The disadvantage is that it is slightly more laborious to implement and less flexible. Don’t get me wrong. I have no problems imagining scenarios where I would want to use a PHP web framework such as the Zend framework. However, in these cases I’d probably be drawn towards using Java or (hopefully) Scala in the first place. In summary, I have found myself using PHP mostly in situations where a web framework seemed dispensable, while I have been using Java mostly in situations where a web framework seemed essential.

The Problem With Cup Typing

First I should explain what I mean with cup typing. When you buy a cup of coffee, you have the choice of short, tall, or grande sized cup. Sometimes you can also choose  decaf or regular. When you declare an integer variable in Java, you have the choice of  byte, short, int, and long. Sometimes (in languages like C++) you can also choose between signed and unsigned. The similarity is obvious. And it doesn’t end with integers. Floating point numbers come in two different flavours, namely as regular “float” values (32-bit) and as “double” values (64-bit). Characters come in the form of 7-bit, 8-bit and 16-bit encodings. In statically typed programming languages, multiplicity is the rule rather than the exception. While Fortran and Pascal offer a moderate choice of two different integers, Java offers four plus a BigInteger implementation (“extra grande”) for really large numbers. However, it’s C# that takes the biscuit in cup typing with 9 different integer types and 3 different real  types. Database systems are keeping up with this trend. For example, the popular MySQL RDBMS offers 5 different integer types and 3 different real types. Seeing the evolution from Fortran to C#, it almost appears as if type plurality has increased over time. We must ask two things: How did this come about and is it useful? We appreciate the fact that we can buy coffee in different cup sizes to match our appetite, but does the same advantage apply to data types?

The first question is easy to answer. Graduated types result from the fact that computer architectures have evolved in powers of two. Over several decades, the register width of the CPU of an average PC has expanded from 8 to 16 to 32 to  64 bits. Each step facilitated the use of larger types and numeric types in particular were closely matched to register width. Expressing data types in a machine-oriented way appears to be a C legacy and quite a few newer programming languages have been strongly influenced by C. – It is my contention that while curly braces and ternary operators are an acceptable C-language tradition, graduated types are definitely not. Why not? Because they counter abstraction. They hinder rather than serve the natural expression of mathematical constructs. Have you ever wondered whether you should index an array with byte- or short-sized integers? Whether you should calculate an offset using int or long values? Whether method calls comply with type widening rules? Whether an arithmetic operation might overflow? Whether a type cast may lose significant bits or not? All of this is a complete waste of time in my view. Wouldn’t it be better to let the virtual machine worry about such low-level questions, or the library if a VM is not present? Cup typing gets positively annoying when you have to write an API that is flexible enough to deal with parameters of different widths. If there’s no type hierarchy, you inevitably end up with multiple overloaded constructors and methods (one for each type) which add unnecessary bulk. The Java APIs are full of such examples and the valueOf() method is a case in point – it’s really ugly.

However, graduated types are beyond ugly; they are outright evil. They cause an enormous number of bugs and the small numeric types are the prime offenders. I wonder how many times a signed or unsigned byte has caused erratic program behaviour by silently overflowing. Such bugs can be hard to find and worse – they often don’t show until certain border conditions are reached. Casts that shorten types also belong to the usual suspects. I shall not even mention the insidious floating point operations that regularly unsettle newbie programmers with funny looking computation results. What numeric types does one really need? – Integer numbers and real numbers. One of each and not more. – If you want to be generous as a language designer, you can throw in an optimised implementation of a complex number type and a rational number type. However, in an object-oriented language with operator overloading, it’s fairly easy to express these in a library. The fixed comma type (sometimes called decimal type) is the subset of the rational type where the denominator is always a power of ten. So, that’s really all you need – a clean representation of the basic mathematical number systems.

At this point, you might object: “but the CPU register is only x bits wide,” or “how do I allocate an array of fifty thousand short values?”, or “can I still have 8-bit chars?” Unfortunately, there is no simple answer to these questions. The natural way to represent integers is to always use the machine’s native word width, but unfortunately that doesn’t solve the problem. First of all, the word width is architecture dependent. Second, it might be wasteful for large arrays that hold small numbers and on the other hand it would still be too small for applications that need big integers. The solution is of course a variable size type, i.e. an integer representation that can grow from byte size to multiple word lengths. We have variable length strings, so why shouldn’t we have variable length numbers? It seems perfectly natural. There is certainly some overhead involved, because variable length types need special encoding. The overhead will be most likely due to loading a descriptor value and/or to bit shifting operations. After all, variable length numbers don’t come for free, but they do offer tremendous advantages. They relieve the programmer from making type width decisions, as well as documenting these decisions – and worse – changing the type width later if the decision turned out to be inadequate. Furthermore, they eliminate the above mentioned bugs resulting from silent overflows and type cast errors, not to mention API proliferation due to type plurality. Thus variable length numbers are generally preferable to common fixed width types.

Of course, there are situations where you know that you will never need more than a byte. There are also situations where performance is paramount. In addition, APIs and libraries based on multiple fixed types are not going to disappear overnight. To provide backward compatibility and to offer optimisation pathways to the programmer, a language could present these as subsets of the mathematical type. For example, if a language defines the keyword “int” for variable length integer numbers, then “int(8)” could mean a traditional byte, “int(16)” could mean a short word, and so on. Now, this is a bit like reintroducing cup typing through the back door. Therefore the use of subtypes for general purpose computations should be discouraged. However, it’s always better to have a choice of fixed and variable types than having no variable types at all.

Parallel Programming

During the last few years, we have seen a trend change in CPU design. Until about 2003, CPUs  became more powerful through frequency scaling. The number of operations per second increased exponentially over a period of 20 years. Clock speeds went from 4.77 MHz in the first IBM PC (1981) to 3.3 GHz in a high-end PC of the year 2003. Thus, Moore’s law was supported chiefly by increasing clock speed. Then something “strange” happened: clock speeds plateaued. For a brief moment, it appeared that Moore’s law was nearing its end. Not because of the physical limits of miniaturisation, or because higher clock speeds were impossible, but because of thermal problems associated with such high clock frequencies. Water cooled systems were already introduced at the top-end of the market, but obviously the energy efficiency of these systems presents a serious economic problem. Then in 2004, the first PC dual core processor was introduced. Dual cores became widely available in 2005 and by now we have become used to quadcore processors, while eight-core systems are around the corner. The trend is obvious: in the forseeable future we will move from multi-core to many-core CPUs with dozens and perhaps even more cores.

Increasing the clock speed of a CPU has the effect, that program execution speed increases proportionally with clock speed. The increase is independent of software and hardware architectures. A sequential algorithm simply runs twice as fast on a machine where CPU operations take half the time. Unfortunately, this is not the case for a machine with twice as many CPUs. A sequential algorithm runs just as fast on a 3 GHz single core CPU than on a 3 GHz dual core CPU, because it uses only one of the CPU cores. Increased overall execution speed is therefore only achieved when multiple tasks run concurrently, because they can be executed simultaneously on different CPUs. This problem of parallelism is not new in computer science. Unfortunately though, today’s programming languages aren’t very well equipped to deal with parallel scaling. The computer language idiom that typifies multi-core concurrency is the thread model. Threads are lightweight processes that are typically used for self-contained subtasks. Several languages, notably Java, offer APIs and native support for their implementation. Threads are well suited to asynchronous processing, for example in communication and networking. They can also be used for simple parallelisation, where a computing problem is parallel by nature (for example serving multiple web documents at the same time). However, threads are relatively cumbersome and thus not really suitable for fine-grained parallel programming, such as traversing data structures or executing nested loops in parallel. Edward A. Lee describes the problems with threads in this excellent article.

The fundamental problem that software engineers face is described by Amdahl’s law. Amdahl’s formula expresses the speed gain of an algorithm achieved by parallelisation: speedup = N / ( (B*N) + (1-B) ), where B is the non-parallelizable (sequential) percentage of the problem and N is the number of worker threads, or processor cores. There are two notable things about Amdahl’s law: (1) the speedup is highly dependent on B, and (2) the curve flattens logarithmically with increasing N. It’s also important to know that Amdahl’s law assumes a constant problem size, which is unrealistic given that parallelisation  requires a certain amount of control overhead. Nevertheless, we can draw several conclusions from Amdahl’s law. First, the performance gain from parallel scaling is lower than that of frequency scaling. Second, it is not proportional to the number of cores. Third, it is highly dependent on software architecture, since the latter chiefly determines the size of B. From the perspective of a software engineer, the last conclusion is probably the most interesting one. It leads to the question what can be done to maximise parallelisation at the level of the algorithm exploiting data and task parallelisms. The present thread model is too coarse-grained to provide solutions at the algorithm-level. Hence, what is called for are new programming idioms that make this task easier.

Ideally, parallelisation is fully automatic, implemented by the compiler and the underlying hardware architecture. In this case, the application programmer could simply continue to formulate sequential algorithms without having to worry about parallelism. Unfortunately, automatic parallelisation has turned out to be extremely complex and difficult to realise. Despite many years of research in compiler architecture, there are no satisfactory results. Parallelisation at the algorithm level involves several abstraction steps, such as decomposition, mapping, scheduling, synchronisation, and result merging. The hardest among these tasks is very likely decomposition, which means breaking down a sequential task into parts that can be solved in parallel. The preferred method for doing this is divide and conquer. A problem is divided into identical smaller pieces, whereas the division can be expressed recursively or iteratively. The problem chunks can then be solved in parallel and the subtask results are joined into one final result. Prime examples for this strategy are merge sort and sequential search algorithms. These are easily parallelisable. Likewise many operations on arrays and collections can be parallelised fairly easily. But certain tasks are not obviously parallelisable, for example the computation of the Fibonacci series: f (n) = f (n-1) + f (n-2). Since every step in the computation depends on the previous step, the Fibonacci algorithm is sequential by nature. In theoretical informatics, the question whether algorithms of the complexity classes P and NP are in principle parallelisable is still unsolved. For practical purposes, some problems are simply non-parallelisable.

Given that automatic parallelisation is currently out of reach, the need for the facilitation of algorithm parallelisation in computer languages boils down to (1) the need for expressive constructs that allow programmers to express parallelisms intuitively, and (2) the need for freeing programmers from writing boilerplate code for the drudge work of parallel execution, such as scheduling, controlling, and synchronisation. There are already a number of computer languages that provide built-in idioms for parallel programming, such as Erlang, Parallel Haskell, MPD, Fortress, Linda, Charm++ and others. However, these are fringe languages with a small number of users. It is questionable whether the   parallel scaling trend in hardware will lead to a wide adoption of any of these languages. Perhaps mainstream languages will evolve to acquire new APIs, libraries, and idioms to support parallel programming. For example, there are Compositional C++, MPI, Unified Parallel C, Co-Array Fortran and other existing extensions that make widely used languages more suitable for parallel programming, although there aren’t any established standards yet. It also remains to be seen whether the functional programming paradigm will catch on in view of parallel programming. Java is promising, because the JDK 7 (Dolphin) will contain a  new API for fine-grained parallel processing. In this very informative article by Brian Goetz, the author of Java Concurrency in Practice, introduces the new java.util.concurrency API features of Java SE 7. It will include a fork-join framework that simplifies the expression of parallel processing, for example in loops or in recursive method invocation. It will also provide new data structures, such as ParallelArray, that make parallel iteration easier to express. To learn more about parallel computing, read the free Introduction To Parallel Computing by Blaise Barney or search the free parallel programming books at Google Books.