Almost all non-trivial applications need to store data of some kind. If the data has the form of records, or n-tuples, it is typically handled by a relational database management system (RDBMS). Relational databases are conceptually founded on set theory and predicate logic. Data in an RDBMS is arranged in tables whose elements can be linked to each other. Today almost all RDBMS use SQL (structured query language) to implement the relational model. RDBMS with SQL have been in use since the late 1970s. Previously an expensive corporate technology, the first open source RDBMS became available during the late 1990s. Presently PostgreSQL and MySQL are the most popular open source RDBMS.
Both database systems are widely used for web applications. Although MySQL has a much larger user base (est. 6 million installations by 2005), the growth of PostgreSQL has recently accelerated. The latter came initially out of an academic environment. PostgreSQL was developed at the Berkeley University as a successor of the proprietary INGRES database. Until 1995, it used QUEL instead of SQL. Since version 6.0, the software is maintained and advanced by a team of volunteers and released free under the BSD license. In contrast, MySQL was developed in a commercial environment by the Swedish company TCX Dataconsult, and later by MySQL AB. It started out as a rewrite of the mSQL database and began to acquire more and better features. MySQL is released under a dual licensing scheme (GPL and paid commercial license).
Since the PostgreSQL developers had a head start of almost 10 years, the PostgreSQL database had hitherto more features than MySQL, especially more advanced features, which are desirable in an “enterprise” computing environment. These include advanced database storage, data management tools, information replication, and backup tools. MySQL, on the other hand, used to have an edge over PostgreSQL in terms of speed. It offered better performance for concurrent database access. Lately, this gap is closing, however. PostgreSQL is getting faster while MySQL acquires more enterprise features. The crucial 5.0 release of MySQL in October 2005 has added stored procedures, triggers, and views.
Let’s look at the commonalities first. Both systems are fully relational, using SQL for data definition, data manipulation, and data retrieval. They run on Windows, Linux, and a number of Unices. MySQL also runs on MacOS. Both databases come with a graphical GUI and query builder, backup, repair, and optimisation tools. They offer standard connectors such as ODBC and JDBC, as well as APIs for all major programming languages. Both systems support foreign keys and data integrity, subselects, transactions, unions, views, stored procedures, and triggers. Among the high-end features that both RDBMS offer are ACID-compliant transaction processing, multiple isolation levels, procedural languages, schemas (metadata), hot backups, data loading, replication (as an add-on in PostgreSQL), table spaces for disk storage layout, terabyte scalability, and SSL. MySQL and PostgreSQL also both support storage of geographic information (GIS). PostgreSQL additionally has network-aware data types that recognize Ipv4 and Ipv6 data types.
Now, let’s look at the differences. PostgreSQL is an object-relational database which means that it has object-oriented features, such as user-definable database objects and inheritance. Users can define data types, indexes, operators (which can be overloaded), aggregates, domains, casts, and conversions. PostgreSQL supports array data types. Inheritance in PostgreSQL allows to inherit table characteristics from a parent table. PostgreSQL also has very advanced programming features. In addition to its native procedural language, PL/pgSQL (which resembles Oracle’s PL/SQL), PostgreSQL procedures can be written in scripting languages, such as Perl, PHP. Python, etc., or compiled languages, such as C++ and Java. In contrast, MySQL (since version 5.0) only supports a native scripting language that follows the ANSI 2003 standard.
PostgreSQL/MySQL Comparison Chart
The most evident advantage that MySQL offers –in terms of features– are its so-called pluggable storage engines. One may choose from a number of different data storage models, which allows the database administrator to optimise databases for the intended application. For example, a web application that makes heavy use of concurrent reads with few write operations may use the MyISAM storage engine to achieve top performance, while an online booking system may use the InnoDB storage engine for ACID-compliant transactions. Another interesting characteristic of MySQL not found in PostgreSQL is its support for distributed databases, which goes beyond mere database replication. Functionality for distributed data storage is offered through the NDB and FEDERATED storage engines, supporting clustered and remote databases respectively.
There are further differences, of course. MySQL is generally faster than PostgreSQL. It maintains a single process to accept new connections, instead of spawning a new process for each connection like PostgreSQL. This is a great advantage for web applications that connect on each page view. In addition, the MyISAM storage engine provides tremendous performance for both simple and complex SELECT statements. Stability is another advantage of MySQL. Due to its larger user base, MySQL has been tested more intensively, and it has historically been more stable than PostgreSQL.
PostgreSQL has a slight advantage over MySQL/InnoDB for concurrent transactions, because it makes use of Multiversioning Concurrency Control (MVCC), a mechanism found only in enterprise-grade commercial RDBMS. Another advantage of PostgreSQL is its relatively strict compliance with the ANSI 92/99 SQL standards, especially in view of data types. The ANSI SQL implementation of MySQL is more incomplete by comparison. However, MySQL has a special ANSI mode that disregards proprietary extensions.
In view of backup/restore capabilities, MySQL provides somewhat less convenience than PostgreSQL and commercial enterprise RDBMS. Nevertheless, hot backup and restore operations can be performed with both systems. Both PostgreSQL and MySQL/InnoDB allow transactional tables to be backed up simply by using a single transaction that copies all relevant tables. The disadvantage of this method is that it uses a lot of resources, which might compromise system performance.
With MySQL, a better solution is to use the replication mechanism for a continuous backup. PostgreSQL allows recovery from disk failure through point-in-time recovery (PiTR). This method combines file system level backups with a write ahead log, that records all changes to the database. Thus it is possible to recreate snapshots of the database of any point in time. In most cases, a crashed databases can be recovered up to the last transaction before the crash. The PiTR is also convenient for large databases, since it preserves resources.
- Excellent code stability
- Excellent performance, fast CONNECT and SELECT
- Multiple storage engines to choose from
- Larger user base (thus larger number of applications and libraries)
- Support for distributed databases
- Many high-quality GUI tools available
- Commercial support widely offered
- Object-oriented features
- Advanced programming concepts
- Supports multiple programming languages
- High ANSI SQL conformance
- Mature high-end features
- Robust online backups
- Very liberal BSD license
In summary, PostgreSQL and MySQL are both mature products with many enterprise level features. They are both catching on with the best commercial RDBMS and are presently making inroads into the high-end market. The philosophy of both RDBMS differs in several ways. Roughly speaking, MySQL is targeted at developers who expect a workhorse database with proven performance, while PostgreSQL is suitable for developers who expect advanced features and programming concepts. MySQL offers more deployment options, whereas PostgreSQL offers more flexibility for developers.