[SCSI] system behaving ungracefully on failure of non-essential disk

Marc Haber (Marc.Haber-usenet-9910@gmx.de)
Wed, 27 Oct 1999 12:34:59 GMT

Messages sorted by: [ date ][ thread ][ subject ][ author ]
Next message: Karsten Keil: "Re: MultiTech PCI128 ISDN Adapter"
Previous message: Lech Szychowski: "Re: access beyond end of device errors in 2.2.13pre18 AND 2.2.5"

Hi!

My test system has three Quantum Atlas I (2 GB each, sda, ID=1, sdb,
ID=2 and sdc, ID=3) on a single Fast SCSI Host Adapter with NCR
Chipset. The fourth disk is a 4 GB seagate drive (sdd, ID=4) which
holds the system itself. sdb and sdc have switches in their power
supply leads to simulate their failure. sda, sdb and sdc are pulled
together into an RAID 5 array using the Linux software raid with
current kernel patches with kernel 2.2.12.

Sometimes, when I switch off a disk to simulate its failure, the
system becomes unusable. It still responds to pings over the network,
shells become inresponsive, no console login is possible and new
telnet connections are refused. The system console showes zillions of
SCSI resets and read errors on ID 3 (which happened to be the "failed"
sdc). I usually wait about two hours before I finally cut power.

The folks at the RAID mailing list tell me this is the SCSI layer
trying _very_ hard to access the failed disk (syslog excerpt can be
mailed on request) instead of giving up eventually and letting the
RAID layer take over. However, they weren't able to provide a way to
circumvent this behavior of the SCSI layer.

The system becoming unuseable and needing to be rebooted questions the
entire purpose of RAID because fault tolerance is not given. I do see
this as a problem.

Do I have any chance to have the SCSI layer give up earlier on read
errors for certain disks or partitions associated with a RAID array?

Any hints will be appreciated.

Greetings
Marc

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/

Next message: Karsten Keil: "Re: MultiTech PCI128 ISDN Adapter"
Previous message: Lech Szychowski: "Re: access beyond end of device errors in 2.2.13pre18 AND 2.2.5"