Database duel – MySQL vs. PostgreSQL

Almost all non-trivial applications need to store data of some kind. If the data has the form of records, or n-tuples, it is typically handled by a relational database management system (RDBMS). Relational databases are conceptually founded on set theory and predicate logic. Data in an RDBMS is arranged in tables whose elements can be linked to each other. Today almost all RDBMS use SQL (structured query language) to implement the relational model. RDBMS with SQL have been in use since the late 1970s. Previously an expensive corporate technology, the first open source RDBMS became available during the late 1990s. Presently PostgreSQL and MySQL are the most popular open source RDBMS.

MySql LogoPostgreSql LogoBoth database systems are widely used for web applications. Although MySQL has a much larger user base (est. 6 million installations by 2005), the growth of PostgreSQL has recently accelerated. The latter came initially out of an academic environment. PostgreSQL was developed at the Berkeley University as a successor of the proprietary INGRES database. Until 1995, it used QUEL instead of SQL. Since version 6.0, the software is maintained and advanced by a team of volunteers and released free under the BSD license. In contrast, MySQL was developed in a commercial environment by the Swedish company TCX Dataconsult, and later by MySQL AB. It started out as a rewrite of the mSQL database and began to acquire more and better features. MySQL is released under a dual licensing scheme (GPL and paid commercial license).

Since the PostgreSQL developers had a head start of almost 10 years, the PostgreSQL database had hitherto more features than MySQL, especially more advanced features, which are desirable in an “enterprise” computing environment. These include advanced database storage, data management tools, information replication, and backup tools. MySQL, on the other hand, used to have an edge over PostgreSQL in terms of speed. It offered better performance for concurrent database access. Lately, this gap is closing, however. PostgreSQL is getting faster while MySQL acquires more enterprise features. The crucial 5.0 release of MySQL in October 2005 has added stored procedures, triggers, and views.

Let’s look at the commonalities first. Both systems are fully relational, using SQL for data definition, data manipulation, and data retrieval. They run on Windows, Linux, and a number of Unices. MySQL also runs on MacOS. Both databases come with a graphical GUI and query builder, backup, repair, and optimisation tools. They offer standard connectors such as ODBC and JDBC, as well as APIs for all major programming languages. Both systems support foreign keys and data integrity, subselects, transactions, unions, views, stored procedures, and triggers. Among the high-end features that both RDBMS offer are ACID-compliant transaction processing, multiple isolation levels, procedural languages, schemas (metadata), hot backups, data loading, replication (as an add-on in PostgreSQL), table spaces for disk storage layout, terabyte scalability, and SSL. MySQL and PostgreSQL also both support storage of geographic information (GIS). PostgreSQL additionally has network-aware data types that recognize Ipv4 and Ipv6 data types.

Now, let’s look at the differences. PostgreSQL is an object-relational database which means that it has object-oriented features, such as user-definable database objects and inheritance. Users can define data types, indexes, operators (which can be overloaded), aggregates, domains, casts, and conversions. PostgreSQL supports array data types. Inheritance in PostgreSQL allows to inherit table characteristics from a parent table. PostgreSQL also has very advanced programming features. In addition to its native procedural language, PL/pgSQL (which resembles Oracle’s PL/SQL), PostgreSQL procedures can be written in scripting languages, such as Perl, PHP. Python, etc., or compiled languages, such as C++ and Java. In contrast, MySQL (since version 5.0) only supports a native scripting language that follows the ANSI 2003 standard.

PostgreSQL/MySQL Comparison Chart

MySql PostgreSql Comparison Chart

The most evident advantage that MySQL offers –in terms of features– are its so-called pluggable storage engines. One may choose from a number of different data storage models, which allows the database administrator to optimise databases for the intended application. For example, a web application that makes heavy use of concurrent reads with few write operations may use the MyISAM storage engine to achieve top performance, while an online booking system may use the InnoDB storage engine for ACID-compliant transactions. Another interesting characteristic of MySQL not found in PostgreSQL is its support for distributed databases, which goes beyond mere database replication. Functionality for distributed data storage is offered through the NDB and FEDERATED storage engines, supporting clustered and remote databases respectively.

There are further differences, of course. MySQL is generally faster than PostgreSQL. It maintains a single process to accept new connections, instead of spawning a new process for each connection like PostgreSQL. This is a great advantage for web applications that connect on each page view. In addition, the MyISAM storage engine provides tremendous performance for both simple and complex SELECT statements. Stability is another advantage of MySQL. Due to its larger user base, MySQL has been tested more intensively, and it has historically been more stable than PostgreSQL.

PostgreSQL has a slight advantage over MySQL/InnoDB for concurrent transactions, because it makes use of Multiversioning Concurrency Control (MVCC), a mechanism found only in enterprise-grade commercial RDBMS. Another advantage of PostgreSQL is its relatively strict compliance with the ANSI 92/99 SQL standards, especially in view of data types. The ANSI SQL implementation of MySQL is more incomplete by comparison. However, MySQL has a special ANSI mode that disregards proprietary extensions.

In view of backup/restore capabilities, MySQL provides somewhat less convenience than PostgreSQL and commercial enterprise RDBMS. Nevertheless, hot backup and restore operations can be performed with both systems. Both PostgreSQL and MySQL/InnoDB allow transactional tables to be backed up simply by using a single transaction that copies all relevant tables. The disadvantage of this method is that it uses a lot of resources, which might compromise system performance.

With MySQL, a better solution is to use the replication mechanism for a continuous backup. PostgreSQL allows recovery from disk failure through point-in-time recovery (PiTR). This method combines file system level backups with a write ahead log, that records all changes to the database. Thus it is possible to recreate snapshots of the database of any point in time. In most cases, a crashed databases can be recovered up to the last transaction before the crash. The PiTR is also convenient for large databases, since it preserves resources.

MySQL Strengths

  • Excellent code stability
  • Excellent performance, fast CONNECT and SELECT
  • Multiple storage engines to choose from
  • Larger user base (thus larger number of applications and libraries)
  • Support for distributed databases
  • Many high-quality GUI tools available
  • Commercial support widely offered

PostgreSQL Strengths

  • Object-oriented features
  • Advanced programming concepts
  • Supports multiple programming languages
  • High ANSI SQL conformance
  • Mature high-end features
  • Robust online backups
  • Very liberal BSD license

In summary, PostgreSQL and MySQL are both mature products with many enterprise level features. They are both catching on with the best commercial RDBMS and are presently making inroads into the high-end market. The philosophy of both RDBMS differs in several ways. Roughly speaking, MySQL is targeted at developers who expect a workhorse database with proven performance, while PostgreSQL is suitable for developers who expect advanced features and programming concepts. MySQL offers more deployment options, whereas PostgreSQL offers more flexibility for developers.

Freebie of the Month: PSPad

A good plain text editor is the Swiss army knife of every programmer. Unfortunately, the Windows operating system offers only the “Notepad” program in this category, which is the equivalent of a $1.50 plastic knife. If you want to do more than opening an occasional README.TXT, then Notepad is definitely underpowered. This situation has created a market for commercial text editors, such as Ultra-Edit, CodeWright, EditPlus and others, which are excellent products, however, these programs are not free. In the open source arena there are well known editors, such as GNU Emacs and vim, which have evolved on the Unix platform. These editors are very powerful, but they are quirky and not exactly easy to learn and use. Why put up with a learning curve, when more user-friendly products are available? A multitude of freeware text editors with varying features is available for the Windows platform.

When I searched the Internet for a freeware editor, I was looking for raw power, speed, and features. In that order. The PSPad editor written by the Czech author Jan Fiala fits the bill perfectly. First of all, it is fast. Even on a modest Pentium IV computer, it starts up in less than two seconds. This is an important characteristic, since a text editor might get loaded dozens of times in succession for viewing or changing different files. It also makes it convenient to use PSPad when I don’t want to fire up a “heavy duty” IDE, such as Eclipse.

PSPad’s look is neat and functional. It presents itself with customisable tool bars, tabbed editor windows and a logically structured menu. Text windows can also be floated or tiled. The feature set of PSPad can compete with commercial high-end products. It includes syntax highlighting for dozens of programming languages, auto backups, macros, hex edit mode, integrated diff comparisons, pluggable text converters, customisable short-cut key map, spell checker, support for Windows, Unix, and Mac line endings, support for different character sets, HTML formatting and HTML validation through Tidy. This makes it ideal for editing a wide variety of file types from C++ source files to HTML pages, SQL statements, XML files, and shell scripts.

One feature I really liked is the multi-language code explorer, a feature that is otherwise only found in high-end IDEs. The code explorer seems to be capable of displaying almost anything from the DOM tree of an HTML document to a PHP or Java class. However, the most important aspect of a text editor for me is powerful search and replace capability. In this area, PSPad once again delivers. PSPad supports Perl-compatible regular expressions for search and replace operations, which is a make-or-break criterion for automated text processing. It also supports search and replace in multiple files, even recursively in subdirectories, which is again great for automated processing. The only limitation is that it cannot do both at the same time. It either processes regular expressions or multiple files, but not both. I am not sure why this limitation exists. Without it, PSPad would be pretty close to perfection.

Open source on the rise, says IDC

According to a recent IDC survey based on over 5,000 developer interviews in 116 countries, open source is gaining momentum. The phenomenon extends well beyond the traditional Linux user groups and computer hobbyists. IDC comes to the conclusion that open source software ought to be viewed as the most significant all-encompassing and long-term trend that the software industry has seen since the early 1980s.

Presently open source products are used in three fourths of all organisations and there are several hundred thousand open source projects under development. IDC says that the pervasive influence of open source will ultimately impact the software industry on a large scale and that it will fundamentally change the value proposition of packaged software for customers. Open source products already begin to play a prominent role in the life-cycle of major software categories.

IDC’s research indicates that open source software is presently deployed by 71% of developers worldwide. 50% stated that the use of open source products in their organisations is growing. Finally, 54% of the surveyed organisations are themselves presently working on some type of open source product.

The study offers additional insights into the proliferation of open source software:

  • Throughout the coming decade, open source will take over a percentage in the low double digits of the software market and elicit fierce price competition

  • The effect of open source on the software life-cycle and on software innovation will outweigh the importance of price effects in the market

  • Three different business models will be vital for vendor success in the software industry: the software revenue model, the public collective model, and the service broker model

  • Core competencies different from the traditional software production and marketing will determine vendor success in markets dominated by open source software

Dr. Anthony Picardi, senior vice president of Global Software Research at IDC, explains: “Although open source will significantly reduce the industry opportunity over the next ten years, the real impact of open source is to sustain innovations in mature software markets, thus extending the useful life of software assets and saving customers money.”

Picradi concluded that “business requirements shift from acquiring new customers to sustaining existing ones, the competitive landscape will move towards costs savings and serving up sustaining innovations to savvy customers, along with providing mainstream software to new market segments that are willing to pay only a fraction of conventional software license fees. Open source software is ultimately a resource for sustaining innovators.”

Exploitation in the info age

When we hear about worker exploitation, we usually think about early industrialisation, sweat shops, mining corporations, commodity dumping prices, and the like. We imagine underpaid workers sweating away under hazardous conditions in stuffy factories. I am not saying that this is a thing of the past -unfortunately it is not- but times have changed. Exploitation has arrived in the info age. Cheap labour is not only available in the low-tech sector anymore, but also in a growing number of skilled services. The Internet makes it possible.

Web sites like rentacoder.com or elance.com specialise in service contracting on the cheap. Interested buyers are offered a variety of professional services including programming, design, web services, and professional writing. These websites function as a global market for service buyers and service providers. The business model is simple. The buyer posts a description of the work and providers submit bids for these projects. The offer is awarded to the most attractive bidder (which often means the cheapest) and the contracting website acts simultaneously as a broker and escrow agent. A fee is charged for the mediation, usually a percentage of the contract amount, which is paid by the contractor.

On the bright side, this creates opportunities for professionals who reside in low-income countries. The majority of service providers, especially in the IT field, are located in Southern Asia and Eastern Europe where IT salaries are low on average. However, there is also a dark side. The competition in this low-cost market is becoming fiercer every day. I recently stumbled across an RFP posted by a Bulgarian web development company for a project that was budgeted at $500. The company expected the project to be completed in one month, provided that the programmer would work 6 days per week 10 hours a day. This comes up to an hourly rate of just about $2 for which apparently even Bulgarian programmers don’t want to work.

If you wonder whether there were any biddings for this project, the answer is yes. There were plenty of them. Seemingly it is always possible to find someone who is willing to work for less. This leads to a situation where programmers churn out as many lines of code as possible in a given amount of time, just to stay competitive. It also creates a playing field for hobby coders, unemployed writers, students, and other amateur contenders. Needless to say that this occurs at the expense of quality and professionalism.

What is more concerning, however, is that it also creates new niches for economic exploitation. The victims are -as always- the economically underprivileged. This emergent problem has not yet been addressed properly by any of the large freelancer websites.

Ajax: a rising star

Until recently most people have associated the name Ajax either with a detergent or with a Dutch football team. This has changed as Ajax has caused a furore in the web development world. It began in 2005 with the introduction of new and highly interactive web applications, such as Gmail, Google Maps (maps.google.com) and Flickr (www.flickr.com), which are based on Ajax programming. Now Ajax is taking the world wide web by storm. The moniker Ajax stands for Asynchronous JavaScript and XML. Although often touted as new paradigm, neither JavaScript, nor XML, nor asynchronous data transfer is new. This is probably the greatest strength of Ajax.
Because Ajax makes use of well-known web technologies, and because the skill set for these technologies is common, Ajax is spreading fast. But we are proleptic. What exactly is Ajax and what does it do? Ajax is a programming paradigm for web applications. It utilizes a combination of four basic web programming techniques:

  • XHTML (in combination with CSS) for the user interface and web content.
  • JavaScript (or any ECMAScript compliant scripting language) in connection with DOM for the dynamic display of data and user interface components.
  • The XMLHttpRequest object (implemented in JavaScript) for the asynchronous exchange of information.
  • XML as a data format for data exchange (or alternatively plain text, JSON, or any other format).

The only thing new to web developers is probably the XMLHttpRequest object. It is the implementation of an API which can be used by client-side scripting languages to transfer data to and from the server in XML format. This API goes back as far as Internet Explorer 5.0 and the year 1999 when it sported the name “XMLHTTP ActiveX Object”. As such it was primarily known to Microsoft programmers and it led a relatively secluded life. Today most up-to-date browsers support the XMLHttpRequest object and recent web applications have exploited it in new ways to improve user experience.

So what does Ajax do? That is easy to explain. Let’s look at how traditional web applications work. You fill in some information, select some options, click a button and then the web server processes your request. During that time you wait. Then the browser renders a new page, you fill out some more information, select some more options and click another button. You wait again. This process is repeated over and over. At each step of the process the entire page has to be rendered anew. While you wait for the server response and the page refresh you cannot use the application.

Google CalendarAjax changes this. Let’s take Google calendar (www.google.com/calendar) for example. The Google calendar looks very much like a desktop GUI application. It features a monthly overview, a weekly timetable, an agenda, tabs, popup windows to display detail information, and so on. While you work with this calendar, say by retrieving the details of a certain event, the application connects to the server behind the scenes and retrieves event details from the database. Once the data becomes available to the client, it is immediately shown via DHTML without having to reload the page and without having to redraw the entire screen. While the data request is working in the background, you can still work with the calendar. Thus the application is always responsive and feels less clunky and slow than a traditional web application. In fact, it feels much more like a desktop GUI application.

Giving a web application the same look and feel as a GUI application and bringing it on par in terms of usability is -as it were- the Holy Grail of web programming. Until recently this has been very difficult to achieve. There are two reasons for this: the statelessness of web applications, and the lack of sophisticated widgets. The statelessness is a direct consequence of the HTTP protocol which does not deliver any context information to the browser, except for cookies and URL parameters. Hence, it is up to the web application to cache and retrieve session context information between successive page requests. The lack of widgets (or UI components) is due to HTML which is rather penurious with UI elements. There is a text field, a select box, checkbox, a radio button, a push button, and that is all you get. What is worse, the style and behaviour of these elements is difficult to control.

Does Ajax solve all these problems? Does it deliver on the web desktop GUI promise? Well, yes and no. Ajax provides great improvements in user experience by enabling asynchronous background processing through the XMLHttpRequest object. This functionality is great for filling data into UI elements and making a web application more responsive despite the transmission latency. It does not per se provide a richer user interface. The user interface still has to be coded manually and in case of Ajax this typically means DHTML code on basis of JavaScript, CSS, and DOM. A rich application with a variety of interactive elements, such as the mentioned Google calendar, consists of thousands of lines of DHTML code.

On the positive side, DHTML is portable. It runs on multiple browsers on multiple operating systems. It doesn’t require any plug-in or browser extension. This makes it a great choice over platform-dependent interface markup languages, such as XUL (pronounced “zool”) which runs only on Mozilla/Gecko browsers and XAML (pronounced “zammel”) which works only with Internet Explorer. The cross-platform compatibility of DHTML has to be taken with a pinch of salt, however. Since the ECMAScript and DOM implementations vary slightly from browser to browser, DHTML programs tend to be quirky and difficult to debug. At any rate they require rigorous testing. It is not unusual for DHTML programmers to spend more than 50% of their time with debugging.

One good thing about Ajax is that it reduces the amount of data transfer between server and client. It also reduces the web server’s CPU load considerably. This is why web service providers, such as Yahoo or Google, love it. Moving the processing load to the client-side effectively reduces their cost. It is certainly also an advantage in enterprise settings where a single machine or cluster serves a large user community. In short, Ajax is light on the server side and heavy on the client side, thus countering the well-known “thin client” approach with a “thick client” approach.

A further advantage which may come unexpected is that Ajax makes the implementation of an MVC (model/view/controller) architecture relatively simple. The client-side DHTML represents the presentation logic (V), whereas the server implements the model (M) and the controller (C). In practice, however, it is difficult to consign the entire controller code to the server, because this would result in an excessive number of requests and thus in considerable overhead. Depending on the nature of the application it may therefore be more practical to either move all application logic to the client, or to use a nested MVC model with controller modules on both sides. Either way, the MVC architecture is neither an intrinsic part of Ajax nor even a necessity, but it is certainly worthwhile considering when designing an Ajax application.

Web developers who have previously worked with a server-side scripting language, such as JSP, PHP, or ASP find that Ajax changes their world. Suddenly a huge chunk -if not all- of the application logic moves to JavaScript. The server-side scripts become lean and simple. In some cases they are reduced to a conduit to the backend, for example a SOAP server or a custom XML protocol. The ultimate Ajaxian approach is perhaps to rid the landscape entirely of XML and to use the JSON format instead. JSON represents data structures better to JavaScript. However, if the data is to be transformed into markup text, it may be more efficient to use an XSLT stylesheet to process XML and produce HTML output, rather than manually parsing and translating JSON.

So what are the drawbacks of Ajax? Are there any? Well, yes… We already mentioned its biggest disadvantage… JavaScript! Not that it is a bad language. Far from it. JavaScript is high-level, powerful, object-oriented, secure, and certainly very useful. However, it is more difficult to debug and maintain than pure server-side scripts. For example, with server-side scripts you never need to waste any thought on browser incompatibilities.

There are other disadvantages. One problem is that Ajax programs tend to mess up the functionality of the sweepingly popular and heavily used back button of the browser. The button doesn’t behave as users expect, because having eliminated successive page loads, the browser doesn’t keep a history anymore. There is a workaround for this. An invisible IFRAME element can be used for data transfer instead of the XMLHttpRequest object. The back button, or rather the history, remembers subsequent IFRAME page loads.

Are there any alternatives to Ajax? Yes, there are many alternative technologies which can accomplish the same as Ajax. Some are experimental, some are platform-dependent. There are two mature platform-independent products that allow the creation of rich GUIs and asynchronous data transfers, namely Java and Macromedia Flash/ActionScript.

Both of these products constitute interesting and commercially viable alternatives.
Of the two, Java is better known and more widely used. Java developers are probably surprised at the recent Ajax hype. After all, distributed computing is an integral part of Java. What Ajax does, Java programmers have been doing for years. The obvious solution for the delivery of GUI applications via the Web are Java applets. Unfortunately, applets are quite unpopular, because they are slow to load, isolated, and they require a plug-in. Other Java technologies, such as JSP/JSF or JSP/Struts, allow the creation of standard web applications with rich user interfaces. The downside is that they rely on a Java Server, or respectively a web server with Java-specific extensions.

The popular Adobe (formerly Macromedia) Flash is a client-side technology for the creation and delivery of vector graphics and animations. It comes with its own programming language named ActionScript which is ECMAScript-like. Thanks to ActionScript, the Flash product is capable of more than animation. A technique called Flash Remoting, i.e. RPC executed by ActionScript, accomplishes asynchronous data transfer using XML and AMF (ActionScript Message Format). The excellent graphic capabilities of Flash can be exploited to create rich web applications. However, there are some disadvantages. Flash is a proprietary single vendor technology; it requires a browser plug-in (the manufacturer claims that 95% of all Internet users have the Flash player installed), and it is graphic-centric rather than document-centric. It is still an excellent alternative to Ajax, especially for applications that make heavy use of graphics.