File uploads not matching their MD5 checksums

Q We are getting very strange corruptions whenever we download files bigger than 100MB on to a new server running Red Hat Enterprise Linux ES 3. The files download to the server without reporting any errors, but the MD5 checksum of the downloaded file does not match that of the source. My colleagues argue that this is a hardware failure, but the vendor insists that this is not the case. We have looked through /var/log/messages and dmesg for signs of errors but could not find any.

As an act of goodwill, the single IDE disk and associated cables have been replaced and the operating system re-installed without any errors. However, on downloading massive attachments we started experiencing the same errors. We are very confused.

A I have come across a very similar situation just once. The issue was inconclusively diagnosed as a faulty motherboard, potentially the onboard IDE controller. To rule out any transfer issues we transferred the file over SSH (SCP or SFTP). With the issue still present a 512MB file of random data was generated locally:

$ openssl rand 536870912 -out
testdata.0

The generated file was then copied over another four times to create a test sample:

$ for FOO in 1 2 3 4; do cp -v
testdata.0 testdata.${FOO}; done

The MD5 checksum for the five 'theoretically identical' files was computed and compared:

$ md5sum testdata.?

In this case the MD5 checksums did not match. As in your case, the disk was replaced but the problem was reproduced. However, when the motherboard was swapped the problem went away. The Kernel-utils package on ES 3 provides the Smart Monitoring Daemon, which can monitor the 'Self-Monitoring, Analysis and Reporting Technology' system built into most modern-day ATA drives. Using Smart it may be possible to single out a failing disk before it actually commits suicide. You could also try disabling DMA and repeating the process:

# /sbin/hdparm -d0 /dev/hda

Back to the list