Files disappearing on Apache Tomcat server using RAID

Q We've set up an Apache Tomcat server with two 500 GB drives using software RAID 1. I made a few changes to some files, restarted the server to test them and found the changes I had made to the files were gone. Some files I had deleted had also reappeared. I checked my mail and had received errors from mdadm.

A DegradedArray event had been detected on md device /dev/md0.
The /proc/mdstat file currently contains the following:
Personalities : [raid1]
md1 : active raid1 sda2[0] sdb2[1]
1959808 blocks [2/2] [UU]
md0 : active raid1 sda1[0]
486424000 blocks [2/1] [U_]
unused devices: <none>

I'm making a backup of all the important information, but if possible I'd like to salvage the server, since the setup was very specific and time consuming. I'm new to the world of Linux administration, and unsure where to start.

A The contents of /proc/mdstat indicate that a drive has failed on the md0 array (/dev/sdb1?). Your machine will continue to function with a degraded array, but with slightly reduced performance and no safeguard against another disk failure. There are a number of tools available to test the disk, but the safest option is to replace it and rebuild your arrays. This will also mean replacing /dev/sdb2 of course, so the other array will have to be rebuilt too. Fortunately, this is a simple task and largely automatic, but it can take a while.

You can also continue to use the computer after replacing the faulty disk while the arrays are being rebuilt, but this will result in noticeably reduced disk performance. It is easiest if you can add the new disk before removing the old one as this means you can rebuild md0 first, then switch md1 to the new disk at your convenience. Assuming your new disk is added as /dev/sdc, connect it up and reboot. Then partition the disk as you did for sda and sdb, setting the partition types to Linux Raid Autodetect. Now run these commands as root, to remove the faulty disk from the array and add the new one:

mdadm /dev/md0 --fail /dev/sdb1 --remove /dev/sdb1
mdadm /dev/md0 --add /dev/sdc1

When the new disk is added to the array, the RAID driver will synchronise it with the existing disk. This can take a while, monitor the contents of /proc/mdstat to follow the progress. When the process is complete you'll have both your arrays working correctly, but using three disks, one of suspect reliability, so repeat the above commands for md1, sdb2 and sdc2 to transfer the other array to the new disk. Now you can power down and remove the faulty disk when it suits you as it is no longer in use. Needless to say, as with any critical disk operation, you should ensure your data is backed up before you do any of this.

You can check the old disk with either smartmontools (http://smartmontools.sourceforge.net) which is probably available in your distro's repositories or check the manufacturer's web site. Most of them provide a diagnostic tool that runs from a bootable floppy disk, which you will need if the disk is to be returned under warranty. If the computer has no floppy drive, most of the diagnostic programs can be run from the Ultimate Boot CD (www.ultimatebootcd.com).

Back to the list