Re: wierd raid 1 problem

From: Tomi Orava
Date: Wed Sep 10 2003 - 14:05:19 EST



> On Maw, 2003-09-09 at 18:59, Ying-Hung Chen wrote:
>> the corrupted files seem to 'recover' itself if i leave the machine
>> alone for a while or umount and mount back the filesystem.
>>
>> does anyone have this type of temperory file corruption problem? I
>> tested it against 2.4.2x kernel including the last vanilla 2.4.22 + xfs
>> patches, they all seem to have the same problem
>
> Classic symptoms of bad memory or a kernel bug corrupting data. See if
> the box passes memtest86 as a starter

I actually saw the same thing happen to me a week ago with
linux-2.4.23-pre1+xfs (xfs bk).

The really wierd thing was that I copied several big files from one
disk-array to another and ran md5sum before & after the copy. The files
were instact before copying, but failed right after the copy. When I
started investigating what the hell is wrong, the next md5sum on the
destination filesystem was succesfull ... and that was it (no more errors).

The source filesystem was two disk RAID1 array with XFS (Sil680).
The destination filesystem was RAID1+0 four disk array with XFS as well
(HPT374 with hightech binary driver 1.10, as 2.4.22 still doesn't work at
all with Epox 8K9A3+ motherboard's integrated HPT374 with dma).

I ran the latest memtest86 for 24h (29 passes) without any errors.

Regards,
Tomi Orava


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/