RE: Disk Corruption with ide hpt-366 controller & software raid5

Tom Livingston (tsl@volition.org)
Tue, 14 Sep 1999 00:10:14 -0700


Andre Hedrick wrote:
> AGP Card Slot
> PCI 1 PDC20246: IDE controller on PCI bus 00 dev 78 hdm/n/o/p
> PCI 2 PDC20246: IDE controller on PCI bus 00 dev 68 hdi/j/k/l
> PCI 3 PDC20246: IDE controller on PCI bus 00 dev 58 hde/f/g/h
> PCI 4
> PCI 5
>
> Update "2.2.12.uniform-ide-6.20.hydra.patch.gz" to straight
> "ide.2.2.12.patch.gz"

The machine is unfortunately at a colocation point, so it may take me until
next weekend that I can try any more tests. I was there an ungodly amount
of time this weekend.

However, I did do some of these tests last night, but with the hydra patch,
so I'll report on the outcome there.

note that actual PCI setup is:
AGP Slot: Trident 975 (IRQ disabled)
PCI 1 PDC20246: IDE controller on PCI bus 00 dev 78 hdm/n/o/p
PCI 2 PDC20246: IDE controller on PCI bus 00 dev 68 hdi/j/k/l
PCI 3 PDC20246: IDE controller on PCI bus 00 dev 58 hde/f/g/h
PCI 4 Empty
PCI 5 3com 3c905b 10/100 ethernet (IRQ shared with PCI 5)

When I moved the PDC in slot 3 to slot 4, hde/f/g/h were unavailable, driver
wasn't able to read them. This is apparently because of the 3com card
sharing the IRQ. Note that any card I put in PCI 4 won't work at this
point, a second ethernet card in PCI 4 causes a lockup within a few minutes.

So I pulled the ethernet card, so I was pci1: pdc, pci2: pdc, pci3: empty,
pci4: pdc, pci5: empty. I ran the same tests, and I still saw corruption
when reading from the drive on the HPT-366 whether or not in was in the raid
array.

You did notice that I am using maxtor 9xxx series UDMA hard drives, right?
I was worried that the issues with the "broken ASIC" were popping up.

> The next test (dangerous) is to pair off two cards into slots 4 and 5.
> This will verify it is not chipset, but driver core.

I didn't do this test, I will try to schedule it soon.

I mentioned that the machine is at a colocation point. I have two other
abit BP-6 system, though both use Tekram SCSI controllers instead of ide.
Since it will be hard to test these issues with the production box, I will
see what I can do to get some more maxtor 9xxx series drives and hook them
up to one of my development servers here. I'm not going to be able to
duplicate the whole environment, but hopefully if it's a core problem I'll
be able to duplicate the defect. I'll get to working on that, but it'll
probably take a couple of days before it's all in place to see if I can get
it to fall over as well.

You may have noticed my report of the same system using IDE & SMP being
extremely unstable, it will crash within minutes of heavy load... Do you
think these problems are related?

Thank you!

Tom

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/