2.2.14 & adaptec 1542 not recovering from SCSI bus reset

From: synrg@bgpc.dymaxion.ca
Date: Tue Feb 15 2000 - 09:08:53 EST


I have had two incidents in which my system has hung under 2.2.14
using an Adaptec 1542CF that I just installed, replacing the old
Adaptec 1520A I had in there before.

First incident:

- I was in 'screen' ssh'd to this headless box, so I was able to
  look about a bit even though the disk was giving me IO errors
  every time I tried to access something on the disk
- The last thing logged to syslog (cached, but not written, as
  I later found out upon a reboot) was a SCSI bus reset
- After my investigation, I switched over a monitor & kbd to this
  box so I could see anything logged to the console .. unfortunately
  it had scrolled and so entirely consisted of ext2 complaining that
  it couldn't find certain inodes and had remounted the single
  filesystem on this box (/ on /dev/sda1) read-only
- I saw no kernel oops during the first incident, but that's not
  saying it didn't happen, as the console messages had scrolled
- Upon reboot, the e2fsck ran clean

Second incident (this morning):

- This time I immediately switched the monitor & kbd to the box
  to see if I could spot anything in the console messages
- I saw the tail end of a kernel oops ... but with only partial
  data and no way to capture it, I couldn't do anything with it
- I had no way of getting at syslog to see if there was a SCSI
  reset
- Upon reboot, there were some e2fsck errors that needed manual
  intervention, but nothing terribly damaging
- I saw no evidence of a SCSI bus reset this time, but that's not
  saying it didn't happen ... with no way to access syslog before
  the reboot, it could very well have been the same as in the first
  incident (i.e. the bus resets, it gets logged, but not written
  to disk before the system is rebooted)

Combing the list archives for the past little while, I did notice
a patch introduced in 2.2.14 (which I verified was applied in this
version) that alleges to fix a kernel oops when the SCSI bus resets.
I wonder if that patch solves the whole problem.

I'd give more info if I could. Unfortunately, the nature of my
problem (the box has a single ext2 filesystem which is immediately
remounted read-only when the crash occurs, preventing logging of
useful messages) makes it rather difficult to collect data. If
someone has tips as to how I can get some useful information, I'll
be happy to give it a try.

Finally, as to what might be causing the SCSI bus resets, I don't know
... perhaps the cat is playing with my temptingly dangly external SCSI
cable or the flimsy power cord for the scanner. I could try unhooking
the SCSI scanner from the box and setting termination on the card to
eliminate that as a possibility. Nevertheless, a bus reset shouldn't
cause my system to completely hang and require a reboot, should it?

Ben

--
    nSLUG       http://www.nslug.ns.ca      synrg@sanctuary.nslug.ns.ca
    Debian      http://www.debian.org       synrg@debian.org
    Chebucto    http://www.chebucto.ns.ca   aa458@chebucto.ns.ca
[ pgp key fingerprint = 7F DA 09 4B BA 2C 0D E0  1B B1 31 ED C6 A9 39 4F ]

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Tue Feb 15 2000 - 21:00:29 EST