Re: sata_mv 0000:03:06.0: PCI ERROR; PCI IRQ cause=0x30000040

From: Harri Olin
Date: Tue Oct 06 2009 - 08:34:52 EST


Mark Lord wrote:
Bernie Innocenti wrote:
The error in the subject appears in the console immediately followed bv
a hard freeze of the machine. The error occurs reproducibly on two
identical Opteron servers, each one equipped with two identical
controller cards:

03:04.0 SCSI storage controller: Marvell Technology Group Ltd. MV88SX6081 8-port SATA II PCI-X Controller (rev 09)
03:06.0 SCSI storage controller: Marvell Technology Group Ltd. MV88SX6081 8-port SATA II PCI-X Controller (rev 09)

We can trigger the problem within a few seconds by starting a
reconstruction on a drive hooked to port 4 (counting from 0) of the
second controller. Oddly, every other drive works reliably and the
faulty drive works if we connect it to, for example, port 4 of the first
controller.

Tested with Debian kernels 2.6.26-19 and 2.6.30-8. Let me know if
further details are needed.
..
0000:03:06.0: PCI ERROR; PCI IRQ cause=0x30000040..
..

0x30000040 here means "MRdPerr":
"bad data parity detected during PCI master read".

Which means there that a data parity error happened
during outgoing data transfer on the PCI-X bus.
This could happen due to noise on the bus,
dying capacitors, or (?) bad RAM (not sure about the last one).

I have heard same thing happened with same kind of configuration, using Supermicro H8DME-2 motherboard, Opteron 2378 CPU.

Even the controllers were on same slots.

My initial suspicion was that the motherboard does not drop the PCI-X bus frequency to 100MHz and drives the bus at 133MHz even though there are 2 controllers connected. Proposed fix was to move the other controller to other bus, as the H8DME-2 has four PCI-X slots, 2x100MHz and 2x133MHz, but I haven't yet heard back if it helped.

Even the kernel was same - latest Debian distribution kernel. Might be worthwile to try using vanilla kernel.org kernel if possible.

I have at home two 6081 controllers at same bus but at 100MHz and no problems yet.

--
Harri.



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/