IDE DMA/irq timeouts

Nick Burrett (nick@dsvr.net)
23 Sep 1999 15:10:53 +0100


Hi,

Since upgrading our servers to kernel version 2.2.10 and 2.2.12 I have
noticed increasing instability with the hard discs.

The servers I use (all single processor configurations) present the
following errors:

hdb: irq timeout: status=0xd0 { Busy }
hda: DMA disabled
ide0: reset: success
hdb: irq timeout: status=0xd0 { Busy }
ide0: reset: success
hdb: irq timeout: status=0xd0 { Busy }
ide0: reset: success

and

hdc: timeout waiting for DMA
hdc: irq timeout: status=0xd0 { Busy }
hdc: DMA disabled
ide1: reset: success

and

hda: dma_intr: status=0x71 { DriveReady DeviceFault SeekComplete Error }
hda: dma_intr: error=0x10 { SectorIdNotFound }, LBAsect=3487119, sector=1452
hda: DMA disabled
hdb: DMA disabled
ide0: reset: success

I originally suspected that this was a problem with the hard discs as the
BIOS has never detected FUJITSU MPC3102AT hard discs correctly (the BIOS
recognises them as 648Mb hard discs, where they are actually 9765Mb). But
our recent experiences with two brand new (i.e. totally unused) MPC3102AT
hard discs suggested that the fault may lie elsewhere.

We installed the MPC3102AT into a computer, partitioned and formatted the
disk. Running e2fsck on the disk displayed many disk errors and send several
inodes to lost and found. The same errors occurred on the second MPC3102AT
hard disc. We placed the hard discs into another computer (a PII 333), where
the BIOS correctly recognised them this time, and exactly the same problem
occurred. The kernel used in both cases was 2.2.12.

The IRQ/DMA timeouts are problematic as they are seem to be related to
some major disk corruption. The kernel doesn't report the disk errors
until too late, and we end up losing a about 90Mb of data to lost+found.

The hard discs we use are:
FUJITSU MPC3102ATE,
Maxtor 91080D5
Maxtor 91010D6
IBM-DTTA-371440

The servers are:
Gigabyte 6A-6BXD Intel 440BX AGP chipset (motherboard)
Pentium II 400Mhz
512Mb RAM
Intel EtherexpressPro 10/100

The hdparm settings we use on all hard discs are:

/dev/hdc:
multcount = 16 (on)
I/O support = 1 (32-bit)
unmaskirq = 0 (off)
using_dma = 1 (on)
keepsettings = 0 (off)
nowerr = 0 (off)
readonly = 0 (off)
readahead = 8 (on)
geometry = 19590/16/63, sectors = 19746720, start = 0

I've heard in the past that overclocking the CPU can cause the sort of
problem that I am seeing. However the CPUs are not overclocked on any
of the servers.

Does anybody have any thoughts on this ?

Regards,

Nick Burrett.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/