Tuesday, July 12, 2011

PostgreSQL vs. MySQL: Which Is the Best Open Source Database?

When you’re choosing a database, you’re making a long-term decision, because changing your mind later is difficult and expensive. You want to get it right the first time. Two popular open source databases, MySQL and PostgreSQL, are often the final two candidates when admins are preparing a new rollout. This high-level overview of these two open source powerhouses should help you choose which is more appropriate for your needs.

PostgreSQL

PostgreSQL (pronounced Post-gress-cue-ell) bills itself as the world’s most advanced open source database. Some of its fans say it is as good as Oracle, but without the baggage of high cost and snooty customer service. It has a long history, having been developed originally in 1985 at the University of California, Berkeley, as a descendant of the Ingres database.
PostgreSQL is a 100% community-driven open source project, maintained by a worldwide community of more than a thousand contributors. It provides a single completely functional version, rather than the multiple different community, commercial, and enterprise versions that MySQL offers. Its license is a liberal BSD/MIT-type, which allows organizations to use, copy, modify, and redistribute code with only a copyright notice required.
Reliablity is PostgreSQL’s top priority. It is known for being rock-solid and well-engineered, capable of supporting high-transaction, mission-critical applications. Documentation is first-rate, with comprehensive manuals available for free online, along with archives of manuals for older releases. Community support is excellent, and commercial support is available from independent vendors.
Data consistency and integrity are also high priorities. PostgreSQL is fully ACID-compliant (atomicity, consistency, isolation, durability.) It has strong security for controlling access to the database, making good use of enterprise security tools such as Kerberos and OpenSSL. You can define your own checks to ensure data quality according to your own business rules. A favorite feature of many admins is point-in-time recovery (PITR), a flexible high-availability feature with powers such as the ability to create a warm standby server for fast failover, and snapshots and restores to specific points in time. But that’s not all – the project provides several methods to manage PostgreSQL for high availability, load-balancing, and replication, so you can use what fits your particular needs.

MySQL

MySQL is a relative youth, first appearing in 1994. It calls itself the world’s most popular open source database. MySQL is the M in LAMP, the software bundle frequently used for web development that also includes Linux, Apache, and Perl/PHP/Python. Most applications built on a LAMP stack incorporate MySQL, including such well-known applications as WordPress, Drupal, Zend, and phpBB.
Initially MySQL was designed to be a fast web server back end, using a fast indexed sequential access method (ISAM), with no ACID support. Since those lean, speedy early days MySQL has added support for a number of additional storage engines, and ACID compliance is now available via the InnoDB engine. MySQL also supports other storage engines, providing capabilities such as temporary tables using the MEMORY storage engine, an example for developers with the EXAMPLE storage engine, fast read-mostly databases using the MyISAM engine, plus several other core storage engines and a number of third-party engines.
MySQL documentation is abundant, and includes good free reference manuals, many books and online articles, and training and support from Oracle and third-party vendors.
MySQL has gone through changes in ownership and a fair bit of drama in recent years. It was first developed by MySQL AB, which sold itself to Sun Microsystems for a cool billion dollars in 2008. Sun was in turn acquired by Oracle in 2010. Oracle supports multiple editions: Standard, Enterprise, Classic, Cluster, Embedded, and Community. Some of these are free downloads, some cost money. The core code is GPL, and commercial licenses are available for developers and vendors who prefer not to use the GPL.
Nowadays there are even more choices for databases based on the original MySQL code, because several key MySQL developers have released MySQL forks. Michael “Monty” Widenius, one of the original founders of MySQL, appeared to develop a case of seller’s regret after the Sun sale, and developed his own MySQL fork, MariaDB, free of cost and licensed under the GPL. Drizzle, a fork by prominent MySQL developer Brian Aker, is a substantial rewrite and change in core concepts that is optimized for multi-CPUs, cloud and net applications, and massive concurrency.

Platforms and Workloads

Both MySQL and PostgreSQL power some of the Web’s highest-demand Web sites:
MySQL:
  • Slashdot
  • Twitter
  • Facebook
  • Wikipedia
PostgreSQL:
  • Yahoo runs a multi-petabyte modified PostgreSQL database that processes billions of events per day
  • Reddit
  • Disqus
Both MySQL and PostgreSQL run on multiple operating systems: Linux, Unix, Mac OS X, and Windows. Both are open source and free of cost, so the only cost for testing them is your time and hardware. Both are flexible and scale well for uses ranging from small deployments to giant distributed systems. MySQL goes one level smaller than PostgreSQL, down into the embedded space, with libmysqld. PostgreSQL does not support embedded applications, sticking instead with traditional client/server architecture.
Optimize your open source deployments MySQL is often thought of as the speedy database back end for websites and applications, performing fast reads and numerous small queries, but offering fewer sophisticated features and data integrity checks. PostgreSQL is considered the solemn, full-featured, no-nonsense workhorse for transactional enterprise applications, with strong ACID compliance and many data integrity checks. Each is faster at some tasks, and MySQL performs differently with different storage engines. The MyISAM engine for MySQL is by far the fastest, because it performs the fewest data integrity checks. It works great as a back end for busy read-mostly websites, but it is a disaster for any read/write database containing sensitive data, because MyISAM tables will inevitably become corrupted. MySQL has good tools for recovering corrupted MyISAM tables, but for sensitive data InnoDB, which is ACID-compliant, is a better choice.
In contrast, PostgreSQL is a complete integrated database server with a single storage engine. You can improve performance by tweaking parameters in postgresql.conf, and tweaking your queries and transactions. The PostgreSQL documentation goes into detail on fine-tuning performance.
Both MySQL and PostgreSQL are very configurable and can tuned to optimize performance for different tasks. Both support extensions for added functionality.
A common misconception is that MySQL is easier to learn than PostgreSQL. Relational database management systems are all complex, finicky beasts, and these two have comparable learning curves.

Standards Compliance

PostgreSQL aims for SQL standards compliance (the current standard is ANSI-SQL:2008). MySQL is mostly SQL-compliant, but also has its own extensions and support for non-SQL features, which are documented in the reference manual. There are pros and cons to each approach. Standards adherence makes life easier for database admins, database developers, and application developers, because it means they have only one standard to learn, a common set of features and commands to use and support, and their code is portable. This adds up to savings in time and effort, and freedom from vendor lock-in.
Arguments in favor of using non-compliant customizations include freedom to quickly roll out useful new features, rather than waiting for them to work through the standards process. ANSI/ISO standards change and evolve, so standards compliance is a moving target anyway; the big-name RDBMSes, such as Microsoft SQL Server, Oracle, and IBM’s DB2 are only partially compliant. Many of the MySQL’s customizations are not exactly game-changers; for example, the differences in the way single and double quotes and escape characters behave in MySQL don’t seem worth the bother or the potential for errors. MySQL adds even more complexity by supporting multiple SQL modes, such as ANSI, to conform more closely to standard traditional SQL for strict data input checking and various other strict and relaxed data-checking modes.

In Conclusion

Despite their different histories, engines, and tools, no clear differentiator distinguishes either PostgreSQL or MySQL for all uses. Many organizations favor PostgreSQL because it is so reliable and so good at protecting data, and because, as a community project, it is immune to vendor follies. MySQL is more flexible and has more options for being tailored for different workloads. Most times an organization’s proficiency with a particular piece of software is more important than differences in feature sets, so if your organization is already using one of these, that is a good reason to stick with it. If you held my dogs hostage and forced me to choose a database for a new project, I would pick PostgreSQL for all tasks, including Web site backends, because of its rock-solid reliability and data integrity. And I would keep Drizzle running on a test machine, to stay acquainted with it, until it is ready for prime time, and then roll it out for cloud and application servers.

No comments:

Post a Comment