Open source on the rise, says IDC

According to a recent IDC survey based on over 5,000 developer interviews in 116 countries, open source is gaining momentum. The phenomenon extends well beyond the traditional Linux user groups and computer hobbyists. IDC comes to the conclusion that open source software ought to be viewed as the most significant all-encompassing and long-term trend that the software industry has seen since the early 1980s.

Presently open source products are used in three fourths of all organisations and there are several hundred thousand open source projects under development. IDC says that the pervasive influence of open source will ultimately impact the software industry on a large scale and that it will fundamentally change the value proposition of packaged software for customers. Open source products already begin to play a prominent role in the life-cycle of major software categories.

IDC’s research indicates that open source software is presently deployed by 71% of developers worldwide. 50% stated that the use of open source products in their organisations is growing. Finally, 54% of the surveyed organisations are themselves presently working on some type of open source product.

The study offers additional insights into the proliferation of open source software:

  • Throughout the coming decade, open source will take over a percentage in the low double digits of the software market and elicit fierce price competition

  • The effect of open source on the software life-cycle and on software innovation will outweigh the importance of price effects in the market

  • Three different business models will be vital for vendor success in the software industry: the software revenue model, the public collective model, and the service broker model

  • Core competencies different from the traditional software production and marketing will determine vendor success in markets dominated by open source software

Dr. Anthony Picardi, senior vice president of Global Software Research at IDC, explains: “Although open source will significantly reduce the industry opportunity over the next ten years, the real impact of open source is to sustain innovations in mature software markets, thus extending the useful life of software assets and saving customers money.”

Picradi concluded that “business requirements shift from acquiring new customers to sustaining existing ones, the competitive landscape will move towards costs savings and serving up sustaining innovations to savvy customers, along with providing mainstream software to new market segments that are willing to pay only a fraction of conventional software license fees. Open source software is ultimately a resource for sustaining innovators.”

Exploitation in the info age

When we hear about worker exploitation, we usually think about early industrialisation, sweat shops, mining corporations, commodity dumping prices, and the like. We imagine underpaid workers sweating away under hazardous conditions in stuffy factories. I am not saying that this is a thing of the past -unfortunately it is not- but times have changed. Exploitation has arrived in the info age. Cheap labour is not only available in the low-tech sector anymore, but also in a growing number of skilled services. The Internet makes it possible.

Web sites like rentacoder.com or elance.com specialise in service contracting on the cheap. Interested buyers are offered a variety of professional services including programming, design, web services, and professional writing. These websites function as a global market for service buyers and service providers. The business model is simple. The buyer posts a description of the work and providers submit bids for these projects. The offer is awarded to the most attractive bidder (which often means the cheapest) and the contracting website acts simultaneously as a broker and escrow agent. A fee is charged for the mediation, usually a percentage of the contract amount, which is paid by the contractor.

On the bright side, this creates opportunities for professionals who reside in low-income countries. The majority of service providers, especially in the IT field, are located in Southern Asia and Eastern Europe where IT salaries are low on average. However, there is also a dark side. The competition in this low-cost market is becoming fiercer every day. I recently stumbled across an RFP posted by a Bulgarian web development company for a project that was budgeted at $500. The company expected the project to be completed in one month, provided that the programmer would work 6 days per week 10 hours a day. This comes up to an hourly rate of just about $2 for which apparently even Bulgarian programmers don’t want to work.

If you wonder whether there were any biddings for this project, the answer is yes. There were plenty of them. Seemingly it is always possible to find someone who is willing to work for less. This leads to a situation where programmers churn out as many lines of code as possible in a given amount of time, just to stay competitive. It also creates a playing field for hobby coders, unemployed writers, students, and other amateur contenders. Needless to say that this occurs at the expense of quality and professionalism.

What is more concerning, however, is that it also creates new niches for economic exploitation. The victims are -as always- the economically underprivileged. This emergent problem has not yet been addressed properly by any of the large freelancer websites.

Ajax: a rising star

Until recently most people have associated the name Ajax either with a detergent or with a Dutch football team. This has changed as Ajax has caused a furore in the web development world. It began in 2005 with the introduction of new and highly interactive web applications, such as Gmail, Google Maps (maps.google.com) and Flickr (www.flickr.com), which are based on Ajax programming. Now Ajax is taking the world wide web by storm. The moniker Ajax stands for Asynchronous JavaScript and XML. Although often touted as new paradigm, neither JavaScript, nor XML, nor asynchronous data transfer is new. This is probably the greatest strength of Ajax.
Because Ajax makes use of well-known web technologies, and because the skill set for these technologies is common, Ajax is spreading fast. But we are proleptic. What exactly is Ajax and what does it do? Ajax is a programming paradigm for web applications. It utilizes a combination of four basic web programming techniques:

  • XHTML (in combination with CSS) for the user interface and web content.
  • JavaScript (or any ECMAScript compliant scripting language) in connection with DOM for the dynamic display of data and user interface components.
  • The XMLHttpRequest object (implemented in JavaScript) for the asynchronous exchange of information.
  • XML as a data format for data exchange (or alternatively plain text, JSON, or any other format).

The only thing new to web developers is probably the XMLHttpRequest object. It is the implementation of an API which can be used by client-side scripting languages to transfer data to and from the server in XML format. This API goes back as far as Internet Explorer 5.0 and the year 1999 when it sported the name “XMLHTTP ActiveX Object”. As such it was primarily known to Microsoft programmers and it led a relatively secluded life. Today most up-to-date browsers support the XMLHttpRequest object and recent web applications have exploited it in new ways to improve user experience.

So what does Ajax do? That is easy to explain. Let’s look at how traditional web applications work. You fill in some information, select some options, click a button and then the web server processes your request. During that time you wait. Then the browser renders a new page, you fill out some more information, select some more options and click another button. You wait again. This process is repeated over and over. At each step of the process the entire page has to be rendered anew. While you wait for the server response and the page refresh you cannot use the application.

Google CalendarAjax changes this. Let’s take Google calendar (www.google.com/calendar) for example. The Google calendar looks very much like a desktop GUI application. It features a monthly overview, a weekly timetable, an agenda, tabs, popup windows to display detail information, and so on. While you work with this calendar, say by retrieving the details of a certain event, the application connects to the server behind the scenes and retrieves event details from the database. Once the data becomes available to the client, it is immediately shown via DHTML without having to reload the page and without having to redraw the entire screen. While the data request is working in the background, you can still work with the calendar. Thus the application is always responsive and feels less clunky and slow than a traditional web application. In fact, it feels much more like a desktop GUI application.

Giving a web application the same look and feel as a GUI application and bringing it on par in terms of usability is -as it were- the Holy Grail of web programming. Until recently this has been very difficult to achieve. There are two reasons for this: the statelessness of web applications, and the lack of sophisticated widgets. The statelessness is a direct consequence of the HTTP protocol which does not deliver any context information to the browser, except for cookies and URL parameters. Hence, it is up to the web application to cache and retrieve session context information between successive page requests. The lack of widgets (or UI components) is due to HTML which is rather penurious with UI elements. There is a text field, a select box, checkbox, a radio button, a push button, and that is all you get. What is worse, the style and behaviour of these elements is difficult to control.

Does Ajax solve all these problems? Does it deliver on the web desktop GUI promise? Well, yes and no. Ajax provides great improvements in user experience by enabling asynchronous background processing through the XMLHttpRequest object. This functionality is great for filling data into UI elements and making a web application more responsive despite the transmission latency. It does not per se provide a richer user interface. The user interface still has to be coded manually and in case of Ajax this typically means DHTML code on basis of JavaScript, CSS, and DOM. A rich application with a variety of interactive elements, such as the mentioned Google calendar, consists of thousands of lines of DHTML code.

On the positive side, DHTML is portable. It runs on multiple browsers on multiple operating systems. It doesn’t require any plug-in or browser extension. This makes it a great choice over platform-dependent interface markup languages, such as XUL (pronounced “zool”) which runs only on Mozilla/Gecko browsers and XAML (pronounced “zammel”) which works only with Internet Explorer. The cross-platform compatibility of DHTML has to be taken with a pinch of salt, however. Since the ECMAScript and DOM implementations vary slightly from browser to browser, DHTML programs tend to be quirky and difficult to debug. At any rate they require rigorous testing. It is not unusual for DHTML programmers to spend more than 50% of their time with debugging.

One good thing about Ajax is that it reduces the amount of data transfer between server and client. It also reduces the web server’s CPU load considerably. This is why web service providers, such as Yahoo or Google, love it. Moving the processing load to the client-side effectively reduces their cost. It is certainly also an advantage in enterprise settings where a single machine or cluster serves a large user community. In short, Ajax is light on the server side and heavy on the client side, thus countering the well-known “thin client” approach with a “thick client” approach.

A further advantage which may come unexpected is that Ajax makes the implementation of an MVC (model/view/controller) architecture relatively simple. The client-side DHTML represents the presentation logic (V), whereas the server implements the model (M) and the controller (C). In practice, however, it is difficult to consign the entire controller code to the server, because this would result in an excessive number of requests and thus in considerable overhead. Depending on the nature of the application it may therefore be more practical to either move all application logic to the client, or to use a nested MVC model with controller modules on both sides. Either way, the MVC architecture is neither an intrinsic part of Ajax nor even a necessity, but it is certainly worthwhile considering when designing an Ajax application.

Web developers who have previously worked with a server-side scripting language, such as JSP, PHP, or ASP find that Ajax changes their world. Suddenly a huge chunk -if not all- of the application logic moves to JavaScript. The server-side scripts become lean and simple. In some cases they are reduced to a conduit to the backend, for example a SOAP server or a custom XML protocol. The ultimate Ajaxian approach is perhaps to rid the landscape entirely of XML and to use the JSON format instead. JSON represents data structures better to JavaScript. However, if the data is to be transformed into markup text, it may be more efficient to use an XSLT stylesheet to process XML and produce HTML output, rather than manually parsing and translating JSON.

So what are the drawbacks of Ajax? Are there any? Well, yes… We already mentioned its biggest disadvantage… JavaScript! Not that it is a bad language. Far from it. JavaScript is high-level, powerful, object-oriented, secure, and certainly very useful. However, it is more difficult to debug and maintain than pure server-side scripts. For example, with server-side scripts you never need to waste any thought on browser incompatibilities.

There are other disadvantages. One problem is that Ajax programs tend to mess up the functionality of the sweepingly popular and heavily used back button of the browser. The button doesn’t behave as users expect, because having eliminated successive page loads, the browser doesn’t keep a history anymore. There is a workaround for this. An invisible IFRAME element can be used for data transfer instead of the XMLHttpRequest object. The back button, or rather the history, remembers subsequent IFRAME page loads.

Are there any alternatives to Ajax? Yes, there are many alternative technologies which can accomplish the same as Ajax. Some are experimental, some are platform-dependent. There are two mature platform-independent products that allow the creation of rich GUIs and asynchronous data transfers, namely Java and Macromedia Flash/ActionScript.

Both of these products constitute interesting and commercially viable alternatives.
Of the two, Java is better known and more widely used. Java developers are probably surprised at the recent Ajax hype. After all, distributed computing is an integral part of Java. What Ajax does, Java programmers have been doing for years. The obvious solution for the delivery of GUI applications via the Web are Java applets. Unfortunately, applets are quite unpopular, because they are slow to load, isolated, and they require a plug-in. Other Java technologies, such as JSP/JSF or JSP/Struts, allow the creation of standard web applications with rich user interfaces. The downside is that they rely on a Java Server, or respectively a web server with Java-specific extensions.

The popular Adobe (formerly Macromedia) Flash is a client-side technology for the creation and delivery of vector graphics and animations. It comes with its own programming language named ActionScript which is ECMAScript-like. Thanks to ActionScript, the Flash product is capable of more than animation. A technique called Flash Remoting, i.e. RPC executed by ActionScript, accomplishes asynchronous data transfer using XML and AMF (ActionScript Message Format). The excellent graphic capabilities of Flash can be exploited to create rich web applications. However, there are some disadvantages. Flash is a proprietary single vendor technology; it requires a browser plug-in (the manufacturer claims that 95% of all Internet users have the Flash player installed), and it is graphic-centric rather than document-centric. It is still an excellent alternative to Ajax, especially for applications that make heavy use of graphics.

The triumph of the lamp

It is human to feel satisfaction when one’s predictions come true. To predict the success of LAMP in 1998 wasn’t that difficult, but neither was it a no-brainer.At the time, the acronym had just been coined by the German c’t magazine and it wasn’t widely known in the corporate world. LAMP stands for Linux-Apache-MySQL-PHP, a set of open-source software that powers web servers with dynamic content. Occasionally, the ‘P’ in LAMP is switched for Perl or Python, although PHP is now by far the most popular scripting language.

carbidelamp.jpgI remember having suggested a LAMP architecture for the implementation of a geo information system and extranet for a large government organisation in 1998. This project was on a tight budget, so I proposed to invest the resources into software development rather than into hardware and licenses. LAMP seemed ideal for it. However, the committee was utterly surprised that the word “Microsoft” did not appear in the proposal and they did not seem to put too much trust into any of the letters of L-A-M-P.

Luckily, another of my then customers was more open to the suggestion. A mid-sized logistics company was looking for new way to do business on the web. Since the company wanted to run their own servers, the LAMP stack offered a perfect solution to do this cost efficiently. It turned out to be foresighted decision. LAMP quickly gathered momentum on the Internet and soon became one of the mainstream web development platforms.

One has to keep in mind that the four pieces of software that make up LAMP have not been designed as a unified platform. On the contrary, they have different histories, and they were not specifically developed to work together. This is what distinguishes them from their competitor platforms ASP/.NET and Java/J2EE.

Let’s briefly go back to the year 1998. It was the time of the browser war and the dot.com boom. The Internet then consisted of about 5 million websites, which is less than one tenth of today (2005). Linux was at kernel version 2.8.x, Apache was at 1.3.0 and the Apache Software Foundation was not yet founded; MySQL was at version 3.21, and Andi Gutmans and Zeev Suraski just released the crucial PHP3 release.

Linux and Apache were already strong at the time. Linux was fairly mature OS with a userbase of 7.5 million in 1998. Torvalds had just trademarked the Linux name and the corporate world started to take notice. The Apache web server -originally developed as an extension of NCSA httpd server- already commanded 50% market share. MySQL and PHP, on the other hand, were new kids on the block. PHP had a userbase of only several ten thousand users and MySQL was widely considered a toy database.

The combination of these products, however, did one thing extremely well: powering dynamic websites. Plus they were free. Anyone who wanted to run a web server could use them without paying a single cent for license fees. Although MySQL and PHP had several limitations at the time, it did not really matter for the purpose of serving web pages. MySQL delivered impressive performance and PHP 3 was “cleaner” and easier to program than Perl. Thus the LAMP stack offered a killer platform for web applications.

Today, the quartet is more successful than ever. Apache powers 70% of all web servers. MySQL is the most widely used database on the Internet. It has matured into a full-featured RDBMS with transactions, replication, clustering, and (as of 5.0) stored procedures, views and triggers. PHP is now installed on 20% of all Internet sites. It fully supports the OOP paradigm and it has grown an extremely large function library that makes programmers feel like boys in a candy store.

What is behind the success of LAMP? In the case of Linux and Apache this is fairly easy to tell. They are both free and they offer excellent performance. You can easily pack 50 to 100 low traffic web sites onto a commodity PC. This makes Linux + Apache very popular with hosting companies who take advantage of the low cost of ownership. Furthermore, it allows service providers to customize the server’s configuration and administration model and apply it to an arbitrary number of cheap boxes which translates to arbitrary scalability. Hosting companies love it.

The case is slightly different with MySQL and PHP, because their growth is driven mainly by developers rather than by hosting companies. From the start, MySQL was geared towards web applications, which means its main strength is concurrent reads. Besides, it is easy to use and administer. PHP has become the scripting language of choice because its learning curve is almost zero, which means that programmers can be productive with PHP from day one. In addition, PHP is integrated very well with Apache from a very early time. It uses resources efficiently and avoids the CGI model and its known security problems.

All of the LAMP components originated around 1995 or before, which means that all of them recently passed their 10th birthday. LAMP can now be considered a mature architecture. The case of LAMP proves that using an open source platform is not like buying a pig in a poke. It proves that open source gets the job done, and -in the case of web servers- that it does the job better than anything else.

Terabyte hard disks on the horizon

The first computer I bought almost exactly 20 years ago had a disk capacity of 720 kB. It was provided by a so-called high-density floppy drive. Somewhat later I added a 40 MB hard disk in 5.25” installation size, which was absolutely lavish for a home computer. Today (mid 2006) the largest commercially available hard disk has a capacity of 750 MB which is roughly one million times that of a 720 kB floppy, or twenty thousand times larger than my first hard disk. The current street price for a 750 GB hard disk is at $400 USD.

This development shows that the 1024 MB hard disk is around the corner and we will soon have to get used to another SI unit called terabyte. In fact, the SI quantity “tera” is misused here, since it refers to the decimal power 10^12, or one trillion, whereas the number of bytes on a terabyte hard disk is actually the power of two 2^40, which amounts to 1,099,511,627,776 bytes.

The IEC has devised the cute sounding name tebibyte for this number in conjunction with gibibyte, mebibyte, and kibibyte for the lower binary quantities. The IEC denomination turned out to be not very popular, however. Have you ever heard anyone speaking of a gibibyte hard disk?

The etymology of the SI quantity specifiers is likewise interesting. They all go back to the Greek language. “Kilo” originates from the Greek khilioi meaning thousand, “mega” comes from the Greek megas which means great or mighty; “giga” or Greek gigas meant giant, and finally tera is the Greek word for monster, which is probably an apt description for a hard disk that large.

Before the monster disk becomes available, there are some technical challenges to master, in particular the challenge of the superparamagnetic effect. The industry’s answer is currently perpendicular recording which aligns bits vertically to the disk surface.

An even newer technology -currently in research state- is heat-assisted-magnetic-recording (HAMR), where a laser beam or a similar energy source heats the disk surface while recording bits. This reduces the required strength of the magnetic field and magnetisation can thus be achieved at a higher density.

There is a problem, though. The disk’s lubricant unfortunately evaporates at these temperatures, which is why the industry is now researching self-lubricating disks that use embedded nano tubes to store replacement lubricant. This technology would allow multi-terabyte hard disks to become a reality. This is probably just a few years away from us.