Sunday, December 13, 2009

DRBD and MySQL - Excellent Low-cost HA Solution

Introduction

DRBD is a Linux project to provide a real-time distributed filesystem. Although it has some clustering support, it's not really a true clustered filesystem like OCFS2 or GFS2.

What it provides is a mirrored copy of a block device across a network.

HA with MySQL Replication

MySQL high availability is often implemented with its in-built replication technology. The standard master-slave configuration provides one database as the primary, receiving all write traffic that is all changes to data.


Read traffic, i.e. SELECT queries, can be sent to the primary or to the replicated slaves.


Transactions flow on the primary database into its binary log. The slave keeps a watchful eye on the primary, copying new transactions into its own relay log.


Keep in mind until 5.1 this involved copying the *actual* SQL statements albeit in binary form.


Once they make it to the slave database, another thread then applies those SQL statements in a serial fashion thus theoretically keeping that slave in the same state as the primary.


The trouble comes when your replication stops, and the error log shows some funny error about duplicate keys or failed primary key constraint.


How's that possible? If all the same transactions are being applied in serial, the two databases should never have a case like this. Strange indeed.


It turns out that as we all know, a lot of things can happen when a query executes to interrupt it or cause otherwise anomalous behavior.


Mixing InnoDB and MyISAM tables (i.e. transactional and non-transactional) is one way to get into trouble, but there are others.


Various functions are non-deterministic in MySQL, meaning they won't behave exactly the same when run twice. Time functions & user functions are a few examples.


So the upshot of all of this is that MySQL replication is not at all bulletproof out of the box, and most master slave setups inevitably have data differences between master and slave.


There are some checksum tools that can be used to alleviate these problems, and of course, there is a list of non-deterministic functions, so you can audit your application to make it completely compliant with MySQL replication. However, that is a lot of work, but it can certainly be done.


What is DRBD?
Acronyms are typically cryptic in computer science, and DRBD is no exception. The initials stand for Distributed Replicated Block Device.


What does that mean exactly? Well the file system is part of your operating system, and it is a layer between the OS and the disk hardware, either a single disk or a disk I/O subsystem such as a RAID.


It is exposed to the OS as a block device, and data is read and written in discrete chunks called blocks. The OS issues requests via the filesystem.


Ok, enough about filesystems, and block devices, what's DRBD then? Well it allows you to have a mirror copy of a block device in real-time. Impressive, right?


Since it's a block device, you'll use it with a filesystem such as ext2 or one of the journaling filesystems ext3, reiserFS or JFS etc. Keep in mind though that it isn't a true clustered filesystem.


It's primarily built to have one side active at a time.


As of version 8 it *can* be used with clustered filesystems such as OCFS2 and GFS2. In these cases where you have both sides mounted as primaries, your clustered filesystem has to figure out which blocks are invalid and need to be synced and in which direction.


These filesystems can be slow, so keep that in mind if you're leaning in that direction.


MySQL and DRBD Working Together
So we see some of the limitations of doing high availability using basic out-of-the box MySQL replication.


So how can DRBD help? Well it turns out that MySQL interacts with the disk through the filesystem, so a MySQL database can sit on top of one of these magical distributed or mirrored block devices that DRBD provides.


So it can effectively keep a complete copy of your database for you on another machine available when and if you need it.


In order to set this up for most production setups you would want to have your data directory on the mirrored block device but have your configuration and log files elsewhere.


That would allow you to have a slightly different setup on both servers, or have separate logfiles.


There are many reasons to want to separate those out. The main thing is that your core database files, innodb tablespace files, and binary logs are on the DRBD volume.


Want to throw in automatic failover? That's where the heartbeat project comes in. Heartbeat is the smarts behind deciding if a node is down.


When it determines that one is down, it will run all the steps you define, to bring the other node back up and ready.


Conclusion
DRBD is definitely exciting technology, and a great match with MySQL. It provides a more bulletproof HA solution for MySQL, when you want to do seamless failover.


It requires less maintenance, and avoids many of the headaches of in-built MySQL replication that every seasoned MySQL DBA has had to deal with.


In next month's article, we'll discuss how to setup DRBD with MySQL. We'll show you how to build a couple of virtual machines, install DRBD from RPM, add a simple configuration, and start it up.


Then we'll take you through creating a simple MySQL database, and finally perform a few tests to illustrate the failover feature and its effectiveness.

No comments:

Post a Comment