RE: Disk Corruption with ide hpt-366 controller & software raid5

Andre Hedrick (andre@suse.com)
Mon, 13 Sep 1999 21:09:29 -0700 (PDT)


On Mon, 13 Sep 1999, Tom Livingston wrote:

> > Are you sure?
>
> Considering the cc: on this list, I am loathe to say I am completely sure.
> But yes, basically, I am completely sure. Every time I do a dd | md5sum it
> returns a different value, where when the raid only uses the pdc the value
> is always the same.

Don't worry about the CC list. This issue overrides all.
There are a few quirks I found last night.

AGP Card Slot
PCI 1 PDC20246: IDE controller on PCI bus 00 dev 78 hdm/n/o/p
PCI 2 PDC20246: IDE controller on PCI bus 00 dev 68 hdi/j/k/l
PCI 3 PDC20246: IDE controller on PCI bus 00 dev 58 hde/f/g/h
PCI 4
PCI 5

Update "2.2.12.uniform-ide-6.20.hydra.patch.gz" to straight
"ide.2.2.12.patch.gz"

There is a change or two that enables a flag or two.

Next move card in pci slot 3 to slot 4.
Next turn off the irq in the BIOS for the AGP video.

Due to the pci-irq routing table design, there is a potential mess.
Some of this is not documented in the shipped manual.

: PIRQ 0 : PIRQ 1 : PIRQ 2 : PIRQ 3 : PIRQ 4
AGP & Slot 1 : A : B : C : D : ?
Slot 2 : D : A : B : C : ?
Slot 3 & HPT366 : C : D : A : B : ?
Slot 4 & Slot 5 : B : C : D : A : ?
USB :

I am not sure if the kernel will mix shared interrupts yet of this nature.
If you move the primary Promise card to slot 4 and repeat the test,
it will tell me if it is in the ide-driver core or in the chipset.
I suspect the former and not the later.

If this is the case, the hunt may expand to other parts of the kernel
down to the "arch" specific handling.

The next test (dangerous) is to pair off two cards into slots 4 and 5.
This will verify it is not chipset, but driver core.

If this more than you want to try I understand, I just need a quick class
in raid setup (had not planned to do this yet) so I can use my drives that
are for testing, wrecking, and trashing. I have the Promise cards and
access to Promise's engineers. I managed to find the one assigned to
"Linux".

> * This behavior definitely happens in uni-processor mode
> * It turns out you don't even need RAID to cause this problem, you just need
> very heavy IO load.

This makes me think it is more than chipset specific and in the
request/services against a mainboard design. A quirk....

> * This behavior never causes a crash, only corrupted reads (unsure about
> writes)
> * I cannot duplicate this behavior at all with the PIIX4 or any of the
> PDC20246's connected to the same machine

Again, these are not overlapped in the irqs.

> There were a couple of other emails in this thread where I give some more
> information, if you missed those I'd be happy to re-send.

Not needed, found them.

Andre Hedrick
The Linux IDE guy

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/