Strange bug with sr_mod

David Odin (dindinx@club-internet.fr)
Fri, 2 Jul 1999 04:59:10 +0200


Hi,

I'm using 2.3.6 and I just have experience a very nasty bug:

It is not an oops, but I couldn't use my box anymore and I had to
reboot it the hard way...

I was reading a file (using more) on a CD-ROM, when the bug appears.
The current console first freezed, and these messages were print on the
screen:

scsi : aborting command due to timeout : pid 162529, scsi0, channel 0, id 1,
lun 0 Read (10) 00 00 00 61 7d 00 00 02 00
SCSI host 0 abort (pid 162529) timed out - resetting
SCSI bus is being reset for host 0 channel 0.
SCSI host 0 abort (pid 162529) timed out - resetting
SCSI bus is being reset for host 0 channel 0.
SCSI host 0 channel 0 reset (pid 162529) timed out - trying harder
SCSI bus is being reset for host 0 channel 0.
SCSI host 0 reset (pid 162529) timed out again -
probably an unrecoverable SCSI bus or device hang.
(scsi0:0:15:0) Synchronous at 40.0 Mbyte/sec, offset 8.
scsi : aborting command due to timeout : pid 162828, scsi0, channel 0, id 1,
lun 0 Prevent/Allow Medium Removal 00 00 00 01 00
SCSI host 0 abort (pid 162828) timed out - resetting
SCSI bus is being reset for host 0 channel 0.
(scsi0:0:15:0) Synchronous at 40.0 Mbyte/sec, offset 8.
SCSI host 0 abort (pid 162828) timed out - resetting

the console wasn't responding to a Ctrl-C, so I switched console in order to
kill the 'more' command that has produced that thing. But it couldn't be
killed (even with the KILL signal), and another two lines were printed every
5 secondes or so.

The CD-ROM couldn't be unmounted because of the unkillable 'more' command,
and so I couldn't use my CD any more. So, to get rid of these very annoying
messages, I had to reboot my box, and since the filesystems couldn't be
unmounted, I had to press the 'Reset' button :(, with all the fscking fsck
of each uncleany unmounted partitions...

Here is some infos about my box:
-------------------------------------------------------------------------
proc/scsi/scsi:

Attached devices:
Host: scsi0 Channel: 00 Id: 01 Lun: 00
Vendor: NEC Model: CD-ROM DRIVE:500 Rev: 2.5
Type: CD-ROM ANSI SCSI revision: 02
Host: scsi0 Channel: 00 Id: 15 Lun: 00
Vendor: FUJITSU Model: MAB3045SP Rev: 0109
Type: Direct-Access ANSI SCSI revision: 02

-------------------------------------------------------------------------
/proc/scsi/aic7xxx/0:

Adaptec AIC7xxx driver version: 5.1.17/3.2.4
Compile Options:
TCQ Enabled By Default : Disabled
AIC7XXX_PROC_STATS : Disabled
AIC7XXX_RESET_DELAY : 5

Adapter Configuration:
SCSI Adapter: Adaptec AHA-294X Ultra SCSI host adapter
Ultra Wide Controller
PCI MMAPed I/O Base: 0xe5800000
Adapter SEEPROM Config: SEEPROM found and used.
Adaptec SCSI BIOS: Enabled
IRQ: 11
SCBs: Active 0, Max Active 2,
Allocated 15, HW 16, Page 255
Interrupts: 46203
BIOS Control Word: 0x18a6
Adapter Control Word: 0x005e
Extended Translation: Enabled
Disconnect Enable Flags: 0xffff
Ultra Enable Flags: 0x8000
Tag Queue Enable Flags: 0x0000
Ordered Queue Tag Flags: 0x0000
Default Tag Queue Depth: 8
Tagged Queue By Device array for aic7xxx host instance 0:
{255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255}
Actual queue depth per device for aic7xxx host instance 0:
{1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1}

Statistics:

(scsi0:0:1:0)
Device using Narrow/Async transfers.
Transinfo settings: current(0/0/0/0), goal(0/0/0/0), user(12/15/1/0)
Total transfers 75 (75 reads and 0 writes)

(scsi0:0:15:0)
Device using Wide/Sync transfers at 40.0 MByte/sec, offset 8
Transinfo settings: current(12/8/1/0), goal(12/8/1/0), user(12/15/1/0)
Total transfers 46044 (41979 reads and 4065 writes)

-------------------------------------------------------------------------

Any one can tell me what to do in such a case. I strongly believe this
is a bug in the aic7xxx driver. I mean I guess it is normal to have some
strange message on the console when a bad cdrom is read, but I think it is
not supposed to end in a dead lock, reseting the scsi bus again and again.

Strangely enough, my HDD which is connected on the same SCSI card was OK
during the bug, and every thing in my box was very ok.

Any thought ?

TIA,

DindinX

-- 
David.Odin@bigfoot.com

"Prepare to be bored!" -- Angelica

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/