Saturday, March 27, 2010

Replace a failed drive in Linux RAID

A few weeks ago I had the distinct displeasure of waking up to a series of emails indicating that a series of RAID arrays on a remote system had degraded. The remote system was still running, but one of the hard drives was pretty much dead.

Upon logging in, it was found that four out of six RAID devices for a particular drive match were running in degraded mode: four partitions of the /dev/sdf device had failed; the two operational partitions still working were the /boot and swap partitions (the system is running three RAID1 mirrored drives; a total of six physical drives).

Checking the SMART status of /dev/sdf showed that SMART information on the drive could not be read. It was absolutely on its last legs.

Luckily, I had a spare 300GB drive with which to replace it, so the removal and restructure of the RAID devices would be easy.

Still remote, I had to mark the two operational partitions on /dev/sdf as faulty, which was done using:
# mdadm --manage /dev/md0 --fail /dev/sdf2
# mdadm --manage /dev/md1 --fail /dev/sdf3

Checking the RAID status output, I verified all of the RAID devices associated with /dev/sdf were in a failed state:
# cat /proc/mdstat
Personalities : [raid1]
md6 : active raid1 sdc1[1] sda1[0]
      312568576 blocks [2/2] [UU]
...
md0 : active raid1 sdf2[1] (F) sde2[0]
      1959808 blocks [2/1] [U_]

The output above is shortened for brevity as there are eight md devices.

The next step was to remove /dev/sdf from all of the RAID devices:
# mdadm --manage /dev/md0 --remove /dev/sdf2
# mdadm --manage /dev/md1 --remove /dev/sdf3
# mdadm --manage /dev/md2 --remove /dev/sdf5
...

Once all of the /dev/sdf devices were removed, the system could be halted and the physical drive replaced.

If you do not have a drive of the exact same size, then you need to use a larger drive; if the replacement drive is smaller, rebuilding the arrays will fail.

When the drive was replaced and the system turned back on, the system booted and from there it was a matter of creating a similar partition layout on the new drive as was on the old drive.

Because this was a mirrored RAID1 series of arrays, we could use the working drive (/dev/sde) as a template:
# sfdisk -d /dev/sde | sfdisk /dev/sdf

This creates the exact same partition layout on /dev/sdf as exists on /dev/sde. Once this is done, run fdisk -l on each drive to verify the partition layout is identical.

The next and final step is to add all of the new partitions to the existing RAID arrays. This is done using:
# mdadm --manage /dev/md0 --add /dev/sdf2
# mdadm --manage /dev/md1 --add /dev/sdf3
# mdadm --manage /dev/md2 --add /dev/sdf5
...

As you add the new devices to the existing array, the information in the array will be properly reconstructed.

Depending on the size of the partition, the re-sync could take a few minutes to a few hours. You can cat /proc/mdstat to see the progress.

With the size of drives available today, my primary concern is data integrity, and for that, nothing beats RAID1.

The hardest part in replacing and reconstructing the RAID arrays was figuring out which of the six drives in the system was the faulty one and replacing it.

The longest part was the reconstruction, but this runs in the background and may make the system run a little sluggish, but it’s still online and available.

The total downtime of this exercise was perhaps 20 minutes. If uptime and data integrity are important, seriously consider using RAID1.

It has saved me numerous times from dying or faulty hardware and the effort required to use it is minimal.

No comments:

Post a Comment