[SCSI] system behaving ungracefully on failure of non-essential disk

Marc Haber (Marc.Haber-usenet-9910@gmx.de)
Wed, 27 Oct 1999 12:34:59 GMT


Hi!

My test system has three Quantum Atlas I (2 GB each, sda, ID=1, sdb,
ID=2 and sdc, ID=3) on a single Fast SCSI Host Adapter with NCR
Chipset. The fourth disk is a 4 GB seagate drive (sdd, ID=4) which
holds the system itself. sdb and sdc have switches in their power
supply leads to simulate their failure. sda, sdb and sdc are pulled
together into an RAID 5 array using the Linux software raid with
current kernel patches with kernel 2.2.12.

Sometimes, when I switch off a disk to simulate its failure, the
system becomes unusable. It still responds to pings over the network,
shells become inresponsive, no console login is possible and new
telnet connections are refused. The system console showes zillions of
SCSI resets and read errors on ID 3 (which happened to be the "failed"
sdc). I usually wait about two hours before I finally cut power.

The folks at the RAID mailing list tell me this is the SCSI layer
trying _very_ hard to access the failed disk (syslog excerpt can be
mailed on request) instead of giving up eventually and letting the
RAID layer take over. However, they weren't able to provide a way to
circumvent this behavior of the SCSI layer.

The system becoming unuseable and needing to be rebooted questions the
entire purpose of RAID because fault tolerance is not given. I do see
this as a problem.

Do I have any chance to have the SCSI layer give up earlier on read
errors for certain disks or partitions associated with a RAID array?

Any hints will be appreciated.

Greetings
Marc

-- 
-------------------------------------- !! No courtesy copies, please !! -----
Marc Haber          |   " Questions are the         | Mailadresse im Header
Karlsruhe, Germany  |     Beginning of Wisdom "     | Fon: *49 721 966 32 15
Nordisch by Nature  | Lt. Worf, TNG "Rightful Heir" | Fax: *49 721 966 31 29

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/