Scala Hits Top 30

The Scala programming language has for the first time hit the top 30 of the TIOBE index in April this year. The TIOBE index measures the popularity of programming languages by counting searches for the respective programming language in the most popular search engines. In April 2009, Scala searches were tracked at 0.237% of all searches which places it at rank 28. This means it is already ahead of other functional languages such as Haskell, Erlang amd Caml. The TIOBE track record is an indicator of Scala’s growing popularity. Scala entered the TIOBE index in early 2008 and appeared in the Top 50 for the first time in the fourth quarter of 2008.

My Journey Through the World of Programming Languages

Photo by Phil JacksonMy journey through the world of programming languages began in 1987 with the blinking cursor on a black-and-white computer screen of an Atari ST 1040 computer. After a few hours of playing with the GFA BASIC interpreter, I was hooked. The graphical capabilities of the Atari computer made it possible to program Mandelbrot fractals, the Towers of Hanoi, the Breakout game, and all those things which newbie programmers like to entertain themselves with.

Quite a few of these programs looked peculiarly similar to what people programmed ten years later when the first Java applets appeared. But I am getting ahead of myself. Back in 1987, BASIC was the beginner language. The GFA BASIC dialect was considered quite modern at the time, since it didn’t have line numbers and it was a full featured procedural language, at least in principle. Yet, it was still a toy. After about six months I felt like writing more ambitious projects and I realised that I had outgrown BASIC. Someone gave me a copy of a C-compiler, so I started learning the C-language.

This was a good decision as it turned out later, because I was able to use C throughout the first five years of my career. I found Kernighan-Ritchie style C to be conceptually very close to the GFA BASIC I had started with except for pointers, which were completely new. The study of C led me to Unix. I began writing clones of Unix tools and utilities for my own use. This was the late 80s before the GNU and Linux phenomena appeared.

One such project was a text editor that I enhanced with optimised scrolling routines in 386 Assembler language. I wrote the editor after I had exchanged my Atari computer for a PC. After a few months I had a number of common Unix tools and a nice text editor at my disposal which I could use under MS-DOS. Then I read Andrew Tanenbaum’s Minix book and I got into system programming. I wrote a micro-kernel task scheduler for the 386 in Assembler. Multi-tasking was a fascinating thing that seemed to be out of reach for an average personal computer. At the time, I briefly considered expanding the micro-kernel into a more complete OS by adding memory management and file management. However, I soon realised the immensity of this task. I had just started studying informatics, and I figured that I wouldn’t be able to accomplish it while still visiting lectures and doing homework.

At university, we were taught Pascal as a “first” programming language and Lisp as a second. Pascal was very easy, of course; it seemed like a verbose dialect of C. – Lisp, on the other hand, I found quite repulsive. – I could appreciate the underlying mathematical idea, the lambda calculus, but the syntax was just awful. I believe it was  IEEE Scheme. The language seemed great for graph-theoretical problems, but unsuitable to express common algorithms in a natural way. In other words, I found it to be a language for eggheads.

At the time, the imperative programming paradigm was predominant. It seemed the best way to get things done, as development tools and libraries for imperative procedural languages were readily available. The next language I learned at the university was Modula 2. I thought of it as an elaboration of Pascal with emphasis on data abstraction and encapsulation. From Modula 2 I learned the importance of encapsulation. Although I didn’t use Modula 2 for practical applications, I was able to apply the conceptual foundation in my work that revolved around C  programming.

After university, I worked in systems programming. I designed and implemented drivers for a company that manufactured proprietary hardware. Then I changed to work with another company in the field of machine translation and computer based training. After 5 years of coding in C, I thought it was time for a change. This was the early nineties, so I turned my attention to application programming with RAD tools which had just hit the market. I learned SQL inside out and created data-driven programs. Visual Basic 3.0 was the killer application in 1993, as it made the construction of Windows GUIs extremely easy. I was able to tie in with my prior Basic experience. Customers liked the productivity that comes with RAD.

After about a year, I dropped VB in favour of Delphi, which was superior for this purpose. Likewise, I could tie in with my previous Pascal experience. I learned the rudiments of object oriented programming with Object Pascal, which is odd given that C++ would have been the more natural path to object orientation after having programmed in C for many years. However, Object Pascal taught me proper componentisation. This was the mid-nineties and a lot of amazing things happened in the IT industry. The most important change was the commercial breakthrough of the Internet. Almost simultaneously, the Linux phenomenon happened. The IT industry boomed and technological progress was fast-paced. The Internet connected everybody everywhere and Linux brought corporate computing horsepower to the desktop.

As a result of these changes, I began coding HTML in 1996 and I learned JavaScript and Perl in 1997. The next year brought even more changes, as I decided to gear my business towards web development. Perl seemed like an idiosyncratic Unix solution born out of necessity. It was certainly practical for server side programming, but it was also rather painful and hackish. Fortunately, PHP appeared at around the same time and it offered a much cleaner solution for server programming.

Soon I found myself programming web applications in PHP most of the time. LAMP-based applications literally exploded on the Internet between 1998 and 2003. During this period, I also learned the rudiments of Java, C++, and C#. I was responsible for the management of projects implemented in all of these languages. Object-oriented programming had become the mainstream paradigm in the late nineties. I decided that I needed to take on one of these languages more seriously.

The obvious choice was Java, since it was general purpose, but still very strong in the field of web development. So I fully immersed myself in Java when the language made the transition from 1.4 to 1.5. At that point, Java was already mature and mainstream. As a latecomer to Java, the platform seemed huge to me, certainly larger than anything I had looked at before, including .NET. The sheer number of APIs was just unbelievable. It required a sustained effort of two years during which I read a shelf of Java books and began moving from trivial programming exercises to small projects and then to larger projects. Since the mid-2000s, Java has become my mainstay.

There are two reasons why I like Java. First, there is a fantastic eco-system connected to the platform. It ranges from best-of-breed IDEs, VMs, and app-servers to a gazillion libraries and frameworks, and (almost) everything is free. Second, Java is extremely scalable and robust. It is not the purest object-oriented language, neither the richest, but Java is probably the one language that transforms average programmers into software engineers. I argue that this is so, because of the high level of standardisation and best practices endorsement in the Java community.

I know that there are quite a few people who debate that. However, there’s a reason why universities teach Java to freshmen and why corporations use Java for enterprise development. It offers the largest and possibly the most robust platform for developing industrial-strength software. Of course, not everything is hunky-dory in the Java department. I perceive that the main problem is the language itself. – It’s aging. – Although (or perhaps because) it forces programmers to write tidy code and relinquish dirty C tricks, it tends to be tedious, as it involves generous amounts of boilerplate code. It also lacks good paradigms for fine-grained concurrency control.

Fortunately, with the Scala language I discovered a possible solution for these problems. At this point -early 2009- I haven’t yet done any larger projects in Scala, but my eagerness to do so is growing. Adding the functional paradigm to my programming instruments is very beneficial. It even flows over into my Java work, since it has changed the way I phrase algorithms in Java. The only negative effect is that by learning Scala, the limitations of the Java language became more evident and thus more painful. While functional programming will probably grow in the near future, Java has such a strong position that it won’t just fade away. Many large systems have been created in Java, so there will be maintenance work for decades to come. Meanwhile, it will be interesting to see how fast the industry embraces functional programming.

Java vs. PHP vs. Scala

I have a bit of a dilemma with programming languages. Next year, I expect to be able to free up a little extra time for a private programming project (call me an optimist!) and I am wondering which language/technology to use. The project is quite straightforward. It's a business application that I use for my own work as a software engineer. It consists of four components. There's a contact manager component (or CRM as it's now fashionably called), a project management component, a time tracking component, and a billing component. That may sound like a tall order, but obviously I don't need the full-blown functionality of applications like Siebel, MS Project, or SAP. I just need an application that brings certain functionality together in a quite specific way to suit my needs.

The software I am currently using for this purpose consists of two different programs. The CRM and billing components are contained in a Delphi application which I wrote more than 10 years ago. The time sheet and project management components are part of a PHP application that I developed in 2002. Needless to say that these two programs are neither cutting-edge, nor are they well integrated. The Delphi application uses an outdated Borland Paradox DB  and the PHP application contains large swathes of ugly procedural code. Although the whole shebang fulfils its purpose, I feel it's high time for a replacement. Of course, I could acquire an existing software package and save a lot of time writing code myself. But hey, I am a software engineer. I do like a creative challenge and I want something that fits my needs. I also want to learn new technologies.

The question I am asking myself now is what to use for the task. I am considering Java, PHP, and Scala. There are pros and cons for each of these:

(1) Java, JSP and a web framework with an app server. This is the obvious choice. Most of my professional work is JEE-based these days. I believe that I can work productively with Java, although the language inevitably involves a lot of boilerplate code and redundancy, which has a negative impact on productivity. In spite of this, it would be an good opportunity to deepen my knowledge of JSF (Java Server Faces), Hibernate, or try out some other persistence layer. It would also offer an opportunity to learn a new Java web framework that I haven't yet worked with such as Spring or Tapestry. From a business point of view, this may be a good choice because Java technologies are in high demand and it is also a very robust platform. The JEE universe is really quite large and there's enough territory that would be fun to explore. The downside is that Java, the language, is slightly tedious.

(2) The second choice is PHP and the Zend framework in combination with some AJAX toolkit, such as YUI or Dojo. I have the feeling this would be the most productive way to go; the biggest bang for the buck so to speak. For a project of this size (around 50 kloc), the development time may be even half of that with Java. PHP 5 and the Zend framework are mature technologies and I am quite familiar with both. Another advantage of PHP is that it's wide spread. Almost every hosting company offers PHP, whereas the number of Java hosting companies is considerably smaller (and usually more expensive). So, there wouldn't be any problem hosting the finished product anywhere. The downside is that PHP, being a dynamic language, is less robust  and slower than JVM bytecode. The language is also less expressive. But the biggest disadvantage is that I'd hardly learn anything new in the process.

(3) The third alternative is using Scala in combination with the Lift framework and a standard web container. I find this the most exciting choice, but it's very likely to be the most time consuming. I am rather new to Scala and functional programming. What I have seen so far is great. Programming in Scala is much more fun than coding in Java or PHP. I am afraid though, it would take a bit of time to wrap my head around it and work productively. Scala is still a foreign language to me. Another downside is that there is a limited choice of frameworks, APIs, and tools available at this point. Actually, Lift is the only Scala web framework I know of. Another question I am asking myself is whether acquiring Scala skills does make any business sense. I haven't seen too many Scala job offerings so far. Seems like the most fun choice, but also the least promising from a business point of view. Decisions, decisions, decisions…

Make WAR with Eclipse

No, it has nothing to do with armed conflict. Making WAR files is the Java way of packaging, distributing, and deploying web applications. While JAR stands for “Java archive”, WAR stands for “Web application archive”, or simply “Web archive”. In fact, the JAR and WAR formats are both gzipped directories that include a manifest. While a JAR file typically contains a collection of class files, a WAR file contains the entire content that goes into a Java Web application. More precisely, a WAR file contains all the static content, directories, JSPs, beans and classes, libraries, as well as the web.xml deployment descriptor. If you unpack a WAR file, you get a directory structure that mirrors the document root of a deployed application in a web container, such as Tomcat. I recently had to create a Web application in Eclipse. I realised that despite having worked with Eclipse for five years, this is something I never did before, because in the past I used Netbeans for creating web applications. But it’s just as easy in Eclipse. Here are is how:

To create a Java web project, you need to have the following software installed: a Java JDK, a recent version of Eclipse that contains the WTP Web Tools Platform module for Eclipse, and a web container or an application server, such as Tomcat, JBoss, WebSphere, etc.

1…Select File/New/Project from the menu. The following dialogue appears:

webapp-img01.png

2…Select Dynamic Web Project from the list and click on the Next button.
webapp-img02.png

3…Type a name for the new project and select a file system location for it. In the Target Runtime option, specify the web container or application server you using. This server is used to build and deploy your web application. If the drop-down box does not contain the desired server, click New… and select one of the predefined configurations (see Step 4). If you have already defined a Target Runtime, you can skip ahead to Step 6. The Dynamic Web Module version option specifies the architecture you are going to use in the web project. Select the latest version for a new project. Unfortunately, this cannot be changed later. By clicking the Modify… button in the Configuration section, you can select “facets” for your web application. What Eclipse calls “facets” are various building blocks and APIs, such as Java Server Faces, Java Persistence API, etc., that add functionality to your application.

webapp-img03.png

4…The New… button in the Target Runtime section opens a dialogue that lets you select the server on which the application is developed and deployed, which is probably the most important aspect of your configuration. Eclipse offers a number of common configurations for popular servers. If you cannot find your server in this list, click on the Download additional server adapters link and chances are that your server is listed. Make sure that the Create a new local server option is checked, so that you can find the server in the Eclipse server view later on.

webapp-img04.png

5…Once you specified the server type, you need to provide some details about it, such as the installation directory of the server, or the server root, and the JRE you want the server to run on. Click Finish when done.

webapp-img05.png

6…Finally, the dynamic web project wizard prompts you for some basic configuration data. The Context Root is the name that the web container matches with the  location where the application is deployed and simultaneously constitutes the root URL for the web application. The Content Directory specifies the name of the directory that contains the web application files. The Java Source Directory specifies the name of the directory that contains Java source code files. These settings are only relevant to the development machine. Make sure that the Generate deployment descriptor option is checked in order to automatically create the web.xml file. In most cases, you can probably accept the default settings and click Finish.

webapp-img06.png

7…Voilá. You have created a web application, or rather the framework for its development in Eclipse. The new project should now be visible in the Navigator view. There aren’t any files yet, except the ones which were generated automatically by Eclipse. The next step would be to write your web application, and possibly incorporating the application framework of your choice. Piece of cake.

webapp-img07.png

8…The Server view should display the server you have chosen for your project. If everything went OK, you can start and stop the server from this view. The server can be started in normal mode, debug mode, or profiling mode. Debug mode needs to be selected if you want to define breakpoints in your Java code. While you edit sources, such as JSP files, servlets, bean classes, static content, etc., Eclipse automatically redeploys these resources to the running server as soon as you save them. You can view your web application in a separate browser window and receive debug output in Eclipse’s Console view.

webapp-img08.png

9…After you have written your formidable web application, it’s time to share it with the world, or in more technical terms, to distribute and deploy it. The process of creating a distributable WAR file is extremely simple. Select File/Export from the Eclipse menu and click on the WAR file option in the Web category.

webapp-img09.png

10…After clicking the Next button, specify the web project to be packaged, the file destination, and the target server. Although the latter is not a mandatory option, it is probably an important one. The selected server is likely to be the same as the one chosen in Step 3. Click Finish and there you have your masterpiece in a handy WAR format.

The Problem With Cup Typing

First I should explain what I mean with cup typing. When you buy a cup of coffee, you have the choice of short, tall, or grande sized cup. Sometimes you can also choose  decaf or regular. When you declare an integer variable in Java, you have the choice of  byte, short, int, and long. Sometimes (in languages like C++) you can also choose between signed and unsigned. The similarity is obvious. And it doesn’t end with integers. Floating point numbers come in two different flavours, namely as regular “float” values (32-bit) and as “double” values (64-bit). Characters come in the form of 7-bit, 8-bit and 16-bit encodings. In statically typed programming languages, multiplicity is the rule rather than the exception. While Fortran and Pascal offer a moderate choice of two different integers, Java offers four plus a BigInteger implementation (“extra grande”) for really large numbers. However, it’s C# that takes the biscuit in cup typing with 9 different integer types and 3 different real  types. Database systems are keeping up with this trend. For example, the popular MySQL RDBMS offers 5 different integer types and 3 different real types. Seeing the evolution from Fortran to C#, it almost appears as if type plurality has increased over time. We must ask two things: How did this come about and is it useful? We appreciate the fact that we can buy coffee in different cup sizes to match our appetite, but does the same advantage apply to data types?

The first question is easy to answer. Graduated types result from the fact that computer architectures have evolved in powers of two. Over several decades, the register width of the CPU of an average PC has expanded from 8 to 16 to 32 to  64 bits. Each step facilitated the use of larger types and numeric types in particular were closely matched to register width. Expressing data types in a machine-oriented way appears to be a C legacy and quite a few newer programming languages have been strongly influenced by C. – It is my contention that while curly braces and ternary operators are an acceptable C-language tradition, graduated types are definitely not. Why not? Because they counter abstraction. They hinder rather than serve the natural expression of mathematical constructs. Have you ever wondered whether you should index an array with byte- or short-sized integers? Whether you should calculate an offset using int or long values? Whether method calls comply with type widening rules? Whether an arithmetic operation might overflow? Whether a type cast may lose significant bits or not? All of this is a complete waste of time in my view. Wouldn’t it be better to let the virtual machine worry about such low-level questions, or the library if a VM is not present? Cup typing gets positively annoying when you have to write an API that is flexible enough to deal with parameters of different widths. If there’s no type hierarchy, you inevitably end up with multiple overloaded constructors and methods (one for each type) which add unnecessary bulk. The Java APIs are full of such examples and the valueOf() method is a case in point – it’s really ugly.

However, graduated types are beyond ugly; they are outright evil. They cause an enormous number of bugs and the small numeric types are the prime offenders. I wonder how many times a signed or unsigned byte has caused erratic program behaviour by silently overflowing. Such bugs can be hard to find and worse – they often don’t show until certain border conditions are reached. Casts that shorten types also belong to the usual suspects. I shall not even mention the insidious floating point operations that regularly unsettle newbie programmers with funny looking computation results. What numeric types does one really need? – Integer numbers and real numbers. One of each and not more. – If you want to be generous as a language designer, you can throw in an optimised implementation of a complex number type and a rational number type. However, in an object-oriented language with operator overloading, it’s fairly easy to express these in a library. The fixed comma type (sometimes called decimal type) is the subset of the rational type where the denominator is always a power of ten. So, that’s really all you need – a clean representation of the basic mathematical number systems.

At this point, you might object: “but the CPU register is only x bits wide,” or “how do I allocate an array of fifty thousand short values?”, or “can I still have 8-bit chars?” Unfortunately, there is no simple answer to these questions. The natural way to represent integers is to always use the machine’s native word width, but unfortunately that doesn’t solve the problem. First of all, the word width is architecture dependent. Second, it might be wasteful for large arrays that hold small numbers and on the other hand it would still be too small for applications that need big integers. The solution is of course a variable size type, i.e. an integer representation that can grow from byte size to multiple word lengths. We have variable length strings, so why shouldn’t we have variable length numbers? It seems perfectly natural. There is certainly some overhead involved, because variable length types need special encoding. The overhead will be most likely due to loading a descriptor value and/or to bit shifting operations. After all, variable length numbers don’t come for free, but they do offer tremendous advantages. They relieve the programmer from making type width decisions, as well as documenting these decisions – and worse – changing the type width later if the decision turned out to be inadequate. Furthermore, they eliminate the above mentioned bugs resulting from silent overflows and type cast errors, not to mention API proliferation due to type plurality. Thus variable length numbers are generally preferable to common fixed width types.

Of course, there are situations where you know that you will never need more than a byte. There are also situations where performance is paramount. In addition, APIs and libraries based on multiple fixed types are not going to disappear overnight. To provide backward compatibility and to offer optimisation pathways to the programmer, a language could present these as subsets of the mathematical type. For example, if a language defines the keyword “int” for variable length integer numbers, then “int(8)” could mean a traditional byte, “int(16)” could mean a short word, and so on. Now, this is a bit like reintroducing cup typing through the back door. Therefore the use of subtypes for general purpose computations should be discouraged. However, it’s always better to have a choice of fixed and variable types than having no variable types at all.