Re: 2.6.28.9: EXT3/NFS inodes corruption

From: Sylvain Rochet
Date: Fri Aug 21 2009 - 10:32:30 EST


Hi,

On Fri, Aug 21, 2009 at 12:05:10PM +0100, Daniel J Blueman wrote:
>
> The reason I ask, I was chasing data corruption across the PCIe bus
> with some high-performance Quadrics interconnect adapters a while ago.
> The reproducer involved multiple outstanding main memory read requests
> to related addresses and a small block of data would be returned from
> the wrong offset.
>
> In the end, I found the nVidia CK804 (also MCP55) HT->PCIe bridge was
> at fault and later found disk corruption when doing heavy rsyncs to
> network. This was never publicly acknowledged, but I guess it
> illustrates the need for some micro-tests to verify data-soundness
> under duress; it took a day (and petabytes of data) of the production
> I/O workload to get this data corruption, and 3 seconds with the right
> reproducer, (still non-trivial to catch on a PCIe protocol analyser).
>
> Sometime I'll develop a stress-test driver for a common SATA or
> network controller to drive it's DMA engine with I/O patterns to and
> from main memory, checking the data integrity every few seconds; this
> could be generalised with OpenGL nicely for graphics cards on
> workstations I imagine.

Hehe, sounds interesting.

Sylvain

Attachment: signature.asc
Description: Digital signature