Saturday, January 8, 2011

Introduction to RAID

RAID is one of those technologies that has really revolutionized storage. In this article, we’ll review the six most common single RAID levels and describe how each works and what issues surround them.

Introduction
One of the most common techniques to improve either data reliability or data performance (or both) is called RAID (Redundant Array of Inexpensive Disks).

The concept was developed in 1977 by David Patterson, Garth Gibson, and Randy Katz as a way to use several inexpensive disks to create a single disk from the perspective of the OS while also achieving enhanced reliability or performance or both.


Before anyone erupts and says that RAID does not stand for “Redundant Array of Inexpensive Disks”, let me start by stating that was the original definition.

Over time, the definition has become more commonly known as “Redundant Array of Independent Disks” perhaps so the word “inexpensive” isn’t associated with RAID controllers or disks.

Personally I use the original definition but regardless, either definition means that the disks are independent of one another.

Feel free to use either definition since it won’t change the content of this article.

Now, back to our discussion of RAID.
When the original paper was issued, five different RAID levels or configurations were defined. Since that time other RAID configurations have been developed including what are referred to as “hybrid” RAID configurations.

The RAID Advisory Board (RAB) was created to help advise the IT community on the defined RAID configurations and to help the creation of new RAID configuration definitions.

While it is not an organization that creates legally binding standards and labeling, it does help in clarifying what the RAID levels mean and what is commonly accepted in the community.

There was a time where companies were creating very strange RAID configurations and using strange labels, causing great confusion.

The RAB has helped to reduce the proliferation of “weird” RAID configurations and labeling and standardize the meaning of various RAID levels.

In this article I want to review the seven most common standard RAID configurations. But I will also very briefly touch on some of the hybrid RAID configurations.

For each RAID level, I will describe how it works as well as the configuration’s particular pros and cons.

However, before starting I want to clarify one thing: RAID is not meant as a replacement for backups. RAID can help improve data reliability which really means data availability (improving uptime for data) and/or data performance (I/O performance).

It is not intended as a replacement for backups or keeping multiple independent copies of your data.

RAID Configurations


As mentioned above, there were five original RAID levels or configurations that were defined but others have been developed since that original article.

In RAID terminology each distinct RAID configuration is given a number which can also be called a RAID “level”.

The core RAID configurations are listed as: RAID-0, RAID-1, RAID-2, RAID-3, RAID-4, RAID-5, and RAID-6.


RAID-0
This RAID configuration is really focused on performance since the blocks are basically striped across multiple disks.

Figure 1 from wikipedia (image by Cburnett) illustrates how the data is written to two disks.
325px-RAID_0.svg.png
Figure 1: RAID-0 layout (from Cburnett at wikipedia under the GFDL license)

In this illustration, the first block of data, A0, is written to the fist disk, the second block of data, A1, is written to the second disk, the third block of data, A3, is written to the first disk, and so on.

If the I/O is happening fast enough data blocks can be written almost simultaneously (i.e. A0 and A1 are written at just about the same time).

Since the data is broken up into block sized units between the disks, it is commonly said that the data is striped across the disks.

As you can see, striping data across the disks means that the overall write performance of the disk set is very fast, usually much faster than a single disk.


Reading from a RAID-0 group is also very fast. A read request comes in and the RAID controller, which controls the placement of data, knows that it can read A0 and A1 at the same time since they are on separate disks, basically doubling the potential read performance relative to a single disk.


You can have as many disks as you want in a RAID-0 array (a group of disks in a RAID-0 configuration).

However, one of the downsides to RAID-0 is that there is is no additional data redundancy provided by RAID-0 (it is all focused on performance).

No data parity is computed and stored meaning that if you lose a disk in a RAID-0 array, you will lose access to all of the data in the array.

If you can bring the lost disk back into the array without losing any data on it, then you can recover the RAID-0 array, but this is a fairly rare occurrence.


Consequently, we can see that RAID-0 is focused solely on performance with no additional data redundancy beyond the redundancy in a single disk.

This affects how RAID-0 is used. For example, it can be used in situations where performance is paramount and you have a copy of your data elsewhere or the data is not important.

A classic usage case is for scratch space where data is written while an application is running but is not needed once the application is done and the final output is copied to a more resilient storage device.

If a scratch space disk is lost while the application is running, you can rebuild the RAID-0 array with one fewer drives, and rerun the application.


The capacity and failure rate of a RAID-0 array is the fairly simple to compute. The capacity is computed as,
Capacity = n * min(disk sizes)

where n is the number of disks in the array and min(disk sizes) is the minimum common capacity across the drives (this indicates that you can use drives of different sizes).

This equation also means that RAID-0 is very capacity effective since it doesn’t waste any space for parity or any other error correction. It uses all of the space for data focusing on performance.

The failure rate is a little more involved but can also be estimated.
MTTFgroup = MTTFdisk / n

where MTTF is the Mean Time To Failure and “group” refers to the RAID-0 array and “disk” refers to a single disk.

So as you add disks, you greatly reduce the MTTF for the RAID-0 array. Having two disks decreases the MTTF by half.

Three disks reduces the MTTF by a factor of 3, and so on. So you can tell why people are reluctant to use RAID-0 for file systems where data availability and reliability is important.

But, RAID-0 is the fastest RAID configuration and has the best capacity utilization of any RAID configuration discussed in this article.

Table 1 below is a quick summary of RAID-0 with a few highlights.


Table 1 - RAID-0 Highlights
Raid Level Pros Cons Storage Efficiency Minimum Number of disks
RAID-0
  • Performance (great read and write performance)
  • Great capacity utilization (the best of any standard RAID configurations)


  • No data redundancy
  • Poor MTTF

100% assuming the drives are the same size 2
RAID-1
RAID-1 is almost the exact opposite of RAID-0 because it uses multiple drives that are mirrors of one another.

Typically two drives are used in RAID-0 but three drive RAID-1 configurations are becoming more common.

RAID-1 takes an incoming block of data to one drive and creates a mirror image (copy) of it on a second drive.

So RAID-1 doesn’t compute any parity of the block - it just copies the entire block to a second drive.

Figure 2 from wikipedia (image by Cburnett) illustrates how the data is written to two disks in RAID-1.
325px-RAID_1.svg.png
Figure 2: RAID-1 layout (from Cburnett at wikipedia under the GFDL license)



In this illustration when block A1 is written to disk 0, the same block is also written to disk 1. Since the disks are independent of one another, the write to disk 0 and the write to disk 1 can happen at the same time.

However, when the data is read, the RAID controller can read block A1 from disk 0 and block A2 from disk 1 at the same time since the disks are independent.

So overall, the write performance of a RAID-1 array is the same as a single disk, and the read performance is actually faster from a RAID-1 array relative to a single disk.


The strength of RAID-1 lies in the fact that disks contains copies of the data. So if you lose disk 0, the exact same data is also on disk 1.

This greatly improves data reliability or availability.

The capacity of RAID-1 is the following:
Capacity = min(disk sizes)

meaning that the capacity of RAID-1 is limited by the smallest disk (you can use different size drives in RAID-1).

For example, if you have a 500GB disk and a 400GB disk, then the maximum capacity would be 400GB (i.e. 400GB of the 500GB drive is used as a mirror, and the remaining 100GB is not used).

RAID-1 has the lowest capacity utilization of any RAID configuration.




The reliability or probability of failure is also described in wikipedia. Since the disks are mirrors of one another but still independent, the probability of having both disks fail, leading to data lose, 
is the following:





P(dual failure) = P(single drive)2

So the probability of failure of a RAID-1 configuration is the square of the failure probability of a single drive.

Since the probability of failure of a single drive is less than 1, that means that the failure of a RAID-1 array is even smaller than the probability of failure of a single drive.


The reference has a more extensive discussion about the probability of failure but in general, the probably is fairly low.


One might be tempted to use RAID-1 for storing important data in place of backups of the data.

While RAID-1 improves data reliability or availability, it does not replace backups. If the RAID controller fails, or if the unit containing the RAID-1 array suffers some sort of failure, then the data is not available and may even be lost.

Without a backup you don’t have a copy of your data anymore. However, if you make a backup of the data, you would have a copy. The moral of the tale is - make real backups and don’t rely on 
RAID-1.




Table 2 below is a quick summary of RAID-1 with a few highlights.


Table 2 - RAID-1 Highlights
Raid Level Pros Cons Storage Efficiency Minimum Number of disks
RAID-1
  • Great data redundancy/availability
  • Great MTTF


  • Worst capacity utilization of single RAID levels
  • Good read performance, limited write performance

50% assuming two drives of the same size 2
RAID-2
This RAID level was one of the original five defined, but it is no longer really used. The basic concept is that RAID-2 stripes data at the bit level instead of the block level (remember that RAID-0 stripes at the block level) and uses a Hamming Coding for parity computations.

In RAID-2, the first bit is written on the first drive, the second bit is written on the second drive, and so on. Then a Hamming-code parity is computed and either stored on the disks or on a separate disk.

With this approach you can get very high data throughput rates since the data is striped across several drives, but you also lose a little performance because you have to compute the parity and store it.


A cool feature of RAID-2 is that it can compute single bit errors and recover from them. This prevents data errors or what some people call “bit rot”.

For an overall evaluation of RAID-2, there is this link.


According to this article hard drives added error correction that used Hamming codes, so using them at the RAID level became redundant so people stopped using RAID-2.


RAID-3
RAID-3 uses data striping at the byte level and also adds parity computations and stores them on a dedicated parity disk.

Figure 3 from wikipedia (image by Cburnett) illustrates how the data is written to four disks in RAID-3.
675px-RAID_3.svg.png
Figure 3: RAID-3 layout (from Cburnett at wikipedia under the GFDL license)




This RAID-3 layout uses 4 disks and stripes data across three of them and uses the fourth disk for storing parity information.





So a chunk of data “A” has byte A1 written to disk 0, byte A2 is written to disk 1, and byte A3 written to disk 3.





Then the parity of bytes A1, A2, and A3 is computed (this is labeled as Ap(1-3) in Figure 3) and written to disk 3.





The process then repeats until the entire chunk of data “A” is written. Notice that the minimum number of disks you can have in RAID-3 is three (you need 2 data disks and a third disk to store the parity).



RAID-3 is also capable of very high performance while the addition of parity gives back some data reliability and availability compared to a pure striping model ala’ RAID-0.

Since the number of disks in a stripe is likely to be smaller than a block all of the disks in a byte-level stripe are accessed at the same time improving read and write performance.

However, the RAID-3 configuration some possible side effects.


In particular, this link explains that RAID-3 cannot accommodate multiple requests at the same time.

This results from the fact that a block of data will be spread across all members of the RAID-3 group (minus the parity disk) and the data has to reside in the same location on each drive.

This means that the disks (spindles) have to be accessed at the same time, using the same stripe, which usually means that the spindles have to be synchronized.

As a consequence, if an I/O request for data chunk A comes into the array (see Figure 3), all of the disks have to seek to the beginning of the chunk A and read their specific bytes and send it back to the RAID-3 controller.

Any other data request, such as that for a data chunk labeled B in Figure 3 is blocked until the request for “A” has completed because all of the drives are being used.

The capacity of RAID-3 is the following:
Capacity = min(disk sizes) * (n-1)

meaning that the capacity of RAID-3 is limited by the smallest disk (you can use different size drives in RAID-3) multiplied by the number of drives n, minus one.

The “minus one” part is because of the dedicated parity drive.


RAID-3 has some good performance since it is similar to RAID-0 (striping), but you have to assume some reduction in performance because of the parity computations (this is done by the RAID controller).

However, if you lose the parity disk you will not lose data (the data remains on the other disks). If you lose a data disk, you still have the parity disk so you can recover data.

So RAID-3 offers more data availability and reliability than RAID-0 but with some reduction in performance because of the parity computations and I/O.

More discussion about the performance of RAID-3 is contained at this link.


RAID-3 isn’t very popular in the real-world but from time to time you do see it used.

RAID-3 is used in situations where RAID-0 is totally unacceptable because there is not enough data redundancy and the data throughput reduction due to the data parity computations is acceptable.

Table 3 below is a quick summary of RAID-3 with a few highlights.


Table 3 - RAID-3 Highlights
Raid Level Pros Cons Storage Efficiency Minimum Number of disks
RAID-3
  • Good data redundancy/availability (can tolerate the lose of 1 drive)
  • Good read performance since all of the drives are read at the same time
  • Reasonable write performance but parity computations cause some reduction in performance
  • Can lose one drive without losing data


  • Spindles have to be synchronized
  • Data access can be blocked because all drives are accessed at the same time for read or write

(n - 1) / n where n is the number of drives 3 (have to be identical)
RAID-4

RAID-3 improved data redundancy by adding a parity disk to add some reliability. In a similar fashion, RAID-4 builds on RAID-0 by adding a parity disk to block-level striping.





Since the striping is now down to a block level, each disk can be accessed independently to read or write data allowing multiple data access to happen at the same time.





Figure 4 below from wikipedia (image by Cburnett) illustrates how the data is written to four disks in RAID-4.

675px-RAID_4.svg.png
Figure 4: RAID-4 layout (from Cburnett at Wikipedia under the GFDL license)



In this layout, data is written in block stripes to the first three disks (disks 0, 1, and 2) while the third drive (disk 3) is the parity drive.

The parity of the blocks across the drives is computed by the RAID controller and stored on the dedicated parity drive.

In the figure The parity for A1, A2, and A3 is listed as Ap on the parity drive.


The dedicated parity drive becomes a performance bottleneck in RAID-4, particularly for write I/O.

Since RAID-4 has block level striping, you can write to blocks A1 and B2 at the same time since they are on different disks.

However, the parity for both blocks has to be written to the same drive which can only accommodate a single write I/O request at a time.

Consequently, one of the parity writes (A1 parity or B2 parity) is blocked and the write I/O performance is reduced.

For more on the performance of RAID-4, please see this link.

The capacity of RAID-4 is the following:
Capacity = min(disk sizes) * (n-1)

meaning that the capacity of RAID-4 is limited by the smallest disk (you can use different size drives in RAID-4) multiplied by the number of drives n, minus one.

The “minus one” part is because of the dedicated parity drive.

However, it is recommended you use drives that are the same size in RAID-4.


RAID-4 improves on the redundancy of RAID-0, which has zero data redundancy, by adding a parity disk.

You can lose one drive without losing data. For example you could lose the parity disk without losing data or you could lose one of the data disks without losing data.

But the introduction of the single dedicated parity drive has reduced write performance relative to RAID-0.

However, if the loss of write performance of RAID-4 is acceptable it does give you more data redundancy than RAID-0.


RAID-4 was the last RAID configuration defined in the original RAID paper.

In the real-world, RAID-4 is rarely used because RAID-5 (see next sub-section) has replaced it.




Table 4 below is a quick summary of RAID-4 with a few highlights.



Table 4 - RAID-4 Highlights
Raid Level Pros Cons Storage Efficiency Minimum Number of disks
RAID-4
  • Good data redundancy/availability (can tolerate the lose of 1 drive)
  • Good read performance since all of the drives are read at the same time
  • Can lose one drive without losing data


  • Single parity disk (causes bottleneck)
  • Write performance is not that good because of the bottleneck of the parity drive

(n - 1) / n where n is the number of drives 3 (have to be identical)
RAID-5
RAID-5 is similar to RAID-4 but now the parity is distributed across all of the drives instead of using a dedicated parity drive.

This greatly improves write performance relative to RAID-4 since the parity is written on all of the drives in the RAID-5 array.

Figure 5 below from wikipedia (image by Cburnett) illustrates how the data is written to four disks in RAID-5.
675px-RAID_5.svg.png
Figure 5: RAID-5 layout (from Cburnett at wikipedia under the GFDL license)




In this layout, the parity blocks are labeled with a subscript “p” to indicate parity. Notice how they are distributed across all four drives.





The blocks that line up (one block per drive) are typically a “stripe”. In Figure 5 the blocks in a stripe are all the same color.





The data stripe size is simply the following:

Data stripe size = block size * (n-1)

where n is the number of drives in the RAID-5 array. Inside a stripe there is a single parity block and all other blocks are data blocks.

Anytime a block inside the stripe is changed or written to, the parity block is recomputed and rewritten (this is sometimes called the read-modify-write process).

This process can add overhead reducing performance.


RAID-5 also has some write performance problems for small writes that are smaller than a single stripe since the parity needs to be computed several times which eats up computational capability of the RAID controller.

As mentioned previously the read-modify-write process that must be followed happens much more often in this case.

The capacity of RAID-5 is very similar to RAID-4 and is the following:
Capacity = min(disk sizes) * (n-1)

meaning that the capacity of RAID-5 is limited by the smallest disk (you can use different size drives in RAID-5) multiplied by the number of drives n, minus one.

The “minus one” part is because of the parity block per stripe.


With RAID-5 you can lose a single drive and not lose data because either the data or the parity for the missing blocks on the lost drive can be found on the remaining drives.

In addition, many RAID controllers allow what is called a hot-spare drive. This drive is typically part of the RAID array but is initially not used for storing data.

If the RAID group loses a drive, the hot-spare is immediately brought into the RAID group by the controller.


In the case of RAID-5, the controller immediately starts redistributing data and parity blocks to this new drive.

To do this, the initial drives in the RAID-5 array have to have all blocks read and the RAID controller has to recompute parity or rebuild missing data blocks.

This combination means that it can take quite a bit of time to fail-over data to the hot-spare drive.

The nice thing about having a hot-spare drives is that typically the fail-over process happens automatically so there is almost no delay in incorporating the hot-spare drive.


RAID-5 has been used for a very long time and during this time the data availability and redundancy has been very good.

However, there is a new phenomenon that impacts RAID-5 that has been explained in various article around the web such as this one.

Basically the capacity of drives is growing quicker than the Unrecoverable Read Error (URE) rate of drives to the point where losing a drive in a RAID-5 array and recovering it to a hot-spare drive is almost guaranteed to lead to a URE which means that the RAID-5 array will be lost and the data has to be restored from a backup.

However, this is the subject for another article.


There is no shortage of articles about RAID-5 on the web. You will see some strong opinions both for and against RAID-5 based on usage cases.

Be sure to understand the application used when reading about both pros and cons of RAID-5.

A reasonable overview of the trade-offs of RAID-5 is this article.

Table 5 below is a quick summary of RAID-5 with a few highlights.


Table 5 - RAID-5 Highlights
Raid Level Pros Cons Storage Efficiency Minimum Number of disks
RAID-5
  • Good data redundancy/availability (can tolerate the lose of 1 drive)
  • Very good read performance since all of the drives can be read at the same time
  • Write performance is adequate (better than RAID-4)
  • Can lose one drive without losing data


  • Write performance is adequate (better than RAID-4)
  • Write performance for small I/O is not good at all

(n - 1) / n where n is the number of drives 3 (have to be identical)
RAID-6
As mentioned previously, there is a potential problem with RAID-5 for larger capacity drives and a larger number of them.

RAID-6 attempts to help that situation by using two parity blocks per stripe instead of RAID-5’s single parity block.

This allows you to lose two drives with losing any data. Figure 6 below from wikipedia (image by Cburnett) illustrates how the data is written to four disks in RAID-6.
800px-RAID_6.svg.png
Figure 6: RAID-6 layout (from Cburnett at wikipedia under the GFDL license)

In this figure, the first parity block is noted with as subscript “p” such as Ap. The second parity block in a stripe is noted with a subscript “q” such as Aq.

The use of two parity blocks reduces the useable capacity of a RAID-6 as in the following:
Capacity = min(disk sizes) * (n-2)

meaning that the capacity of RAID-6 is limited by the smallest disk (you can use different size drives in RAID-6) multiplied by the number of drives n, minus two.

The “minus two” part is because of the two parity blocks per stripe.


Computing the first parity block, p, is done in the same fashion as RAID-5. However, computing the q parity block is more complicated as explained here.

This means that the write performance of a RAID-6 array can be slower than a RAID-5 array for a given level of RAID controller performance.

However, read performance from a RAID-6 is just as fast as a RAID-5 array since reading the parity blocks is skipped. But in exchange for worse performance, RAID-6 arrays can tolerate the lose of two drives while RAID-5 can only tolerate the lose of a single drive.

Coupled with larger drives and larger drive counts, this means that larger RAID-6 arrays can be constructed realtive to RAID-5 arrays.

Table 6 below is a quick summary of RAID-6 with a few highlights.


Table 6 - RAID-6 Highlights
Raid Level Pros Cons Storage Efficiency Minimum Number of disks
RAID-6
  • Excellent data redundancy/availability (can tolerate the lose of 2 drives)
  • Very good read performance since all of the drives can be read at the same time
  • Can lose two drives without losing data


  • Write performance is not that good - worse than RAID-5
  • Write performance for small I/O is not good at all
  • more computational horsepower is required for parity computations

(n - 2) / n where n is the number of drives 4 (have to be identical)

Hybrid RAID Levels

As you can see, there are some limitations to each of the standard RAID levels (0-6). Some of the them have great performance (RAID-0) but pretty awful data availability or redundancy while others have very good data availability and redundancy (RAID-6) but the performance is not so hot.

So as you can imagine, people started to wonder if they couldn’t combine RAID levels to combine features to perhaps achieve better performance while still having very good data redundancy and availability.

This lead to what people called Hybrid RAID Levels or what is more commonly called Nested RAID levels.



The topic of Nested RAID levels is fairly lengthy so I will save that for another article. But the basic concept is to combine RAID levels in some fashion.

For example, a common configuration is called RAID 1+0 or RAID-10. The first number (the furthest to the left) refers to the “bottom” or initial part of the RAID array.

Then the second number from the left refers to the “top” level or the RAID array. The top level RAID uses the bottom level RAID configurations as building blocks.


In the case of RAID-10, the approach is to use multiple pairs of drives at the lowest level (RAID-1) and then to combine them using RIAD-0.

This retains the goodness of RAID-1 for data availability and redundancy while gaining back some performance from RAID-0 striping.


Summary

This wraps our introduction to RAID. For some people it may be new and for many it will be review.





Now that we’ve covered the basics, in coming articles we will be exploring Nested-RAID more in depth, including RAID-01 RAID-5, RAID-6 and RAID-10 configurations.





Have questions about RAID or topics you’d liked to see covered? Post them in the comments and we’ll try to incorporate them as deep dive into redundant arrays. 

No comments:

Post a Comment