md034 + 1.3.77 IDE interrupt timeouts still

J.J. Burgess (92jjb@eng.cam.ac.uk)
Sat, 23 Mar 1996 19:47:09 +0000 (GMT)


with md034 + 1.3.77 I still get the machine locking up when I try
to use a raid0 on 2 ide drives. (while compiling kernel source on the md
drive - note the machine often locks in the link stage, probably issuing
many requests distributed all over the device simultaneously)

# mdtab entry for /dev/md0
/dev/md0 raid0,4k,0,7ca3bde1 /dev/hda5 /dev/hdb5

The error is the same timeout one as we had before:

hdb: irq timeout status=0x58 {DriveReady SeekComplete DataRequest}

Where upon the machine freezes entirely. I guess this might have been
a kernel paging request which failed, this probably has serious locking
potential for the kernel at the moment. This should be handled a bit more
gracefully by the system, i.e. reset the IDE nus (like SCSI reset) and
try the requests again - what this should not do is lock the whole
computer/kernel.
This kind of thing is really needed before the kernel moves to 2.0.
This would mean we have people annoyed with there computer slowing down
with a couple of kernel warnings saying 'IDE timeout - bus reset' rather
than having there entire machine die on them, this is what i'd personally
prefer.

It probably wouldn't be easy, especially if swap requests start to fail
or timeout - but still the kernel should _never_ crash / lockup.

The info about my drives if of any importance is :

/dev/hda:

Model=QUANTUM FIREBALL1080A, FwRev=A1M.0900, SerialNo=24153061
Config={ HardSect NotMFM HdSw>15uSec Fixed DTR>10Mbs }
RawCHS=2112/16/63, TrkSize=32256, SectSize=512, ECCbytes=4
BuffType=3(DualPortCache), BuffSize=83KB, MaxMultSect=8, MultSect=4
DblWordIO=no, maxPIO=2(fast), DMA=yes, maxDMA=2(fast)
CurCHS=2112/16/63, CurSects=2128896, LBA=yes, LBAsects=2128896
tDMA={min:120,rec:333}, DMA modes: sword0 sword1 *sword2 mword0 mword1 *mword2
IORDY=on/off, tPIO={min:333,w/IORDY:120}, PIO modes: mode3 mode4

multcount = 4 (on)
I/O support = 0 (default 16-bit)
unmaskirq = 0 (off)
using_dma = 1 (on)
keepsettings = 0 (off)
nowerr = 0 (off)
readonly = 0 (off)
readahead = 16 (on)
geometry = 2112/16/63, sectors = 2128896, start = 0

/dev/hdb:

Model=Conner Peripherals 850MB - CFS850A, FwRev=0.28, SerialNo=ETC4MJH
Config={ HardSect NotMFM HdSw>15uSec Fixed DTR>10Mbs RotSpdTol>.5% }
RawCHS=1651/16/63, TrkSize=40887, SectSize=649, ECCbytes=4
BuffType=3(DualPortCache), BuffSize=64KB, MaxMultSect=16, MultSect=4
DblWordIO=no, maxPIO=1(medium), DMA=yes, maxDMA=1(medium)
CurCHS=1651/16/63, CurSects=1664208, LBA=yes, LBAsects=1664583
tDMA={min:120,rec:120}, DMA modes: *mword0 *mword1 mword2
IORDY=on/off, tPIO={min:270,w/IORDY:120}, PIO modes: mode3 mode4

multcount = 4 (on)
I/O support = 0 (default 16-bit)
unmaskirq = 0 (off)
using_dma = 1 (on)
keepsettings = 0 (off)
nowerr = 0 (off)
readonly = 0 (off)
readahead = 16 (on)
geometry = 1651/16/63, sectors = 1664583, start = 0

Note: i've set ''hdparm -X34 -d1 /dev/hdb'' as recommended by M.Lord
which seemed to make these interrupt timeouts less likely - although
i'll try messing with this further. I believe this was because
the conner drive was not being set up to the correct transfer mode before,
I don't think any changes to the kernel have happended to change how
the drive is being auto-probed.

Thanks.

.. . . . . . . . . . . . . . ..
:: : : Jon Burgess 01223-461907 : : ::
:: : jjb1003@cam.ac.uk : : ::
:: : : : : : : : : : : : : : ::