Re: amd64 sata_nv (massive) memory corruption

From: John Stoffel
Date: Sat Aug 02 2008 - 16:09:19 EST


>>>>> "Linas" == Linas Vepstas <linasvepstas@xxxxxxxxx> writes:

Linas> 2008/8/1 Alistair John Strachan <alistair@xxxxxxxxxxxxx>:
>> On Friday 01 August 2008 18:30:34 Linas Vepstas wrote:
>>> Hi,
>>>
>>> I'm seeing strong, easily reproducible (and silent) corruption on a
>>> sata-attached
>>> disk drive on an amd64 board. It might be the disk itself, but I
>>> doubt it; googling
>>> suggests that its somehow iommu-related but I cannot confirm this.
>>
>> Nowhere do you explicitly say you have memtest86'ed the RAM.

Linas> It passes memtest86+ just fine. The system has been in heavy
Linas> use doing big science calculations on big datasets (multi-gigabyte)
Linas> for months; these do not get corrupted when copied/moved around
Linas> on the old parallel IDE disk, nor moving/copying on an NFS mount
Linas> to a file server. Only the SATA disk is misbehaving.

Can you post the output of dmesg after a boot, so we can see which
driver is being used? I assume the new Libata stuff, but maybe you
can also turn on debugging in there as well. Stuff like SCSI_DEBUG
(in the SCSI menus) might show us more details here.

Also, have you tried a new SATA cable by any chance? That's obviously
the cheaper path than getting a new disk...

Good luck,
John

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/