OOPS: scsi, AHA1542, CD changer

From: Steven S. Dick (ssd@nevets.oau.org)
Date: Wed Aug 09 2000 - 23:35:39 EST


I'm not sure which layer caused this oops, but it is probably repeatable.

Oops first, then summary:

Linux nevets 2.4.0-test4 #2 Thu Jul 20 08:53:51 EDT 2000 i586 unknown

Unable to handle kernel NULL pointer dereference at virtual address 000000c4
 printing eip:
c01a43b0
*pde = 00000000
Oops: 0002
CPU: 0
EIP: 0010:[scsi_allocate_device+384/612]
EFLAGS: 00010046
eax: 00000006 ebx: 00000000 ecx: c03ed000 edx: c03e05c0
esi: c03e0ec8 edi: c03e0ec0 ebp: c2a31df8 esp: c2a31dc0
ds: 0018 es: 0018 ss: 0018
Process xmms (pid: 5434, stackpage=c2a31000)
Stack: 00000218 c0c72860 c2a31e5c 00000086 c03e0ec8 c03ed000 00000000 00000004
       c01705bf c0c72860 c03e0ef0 c03e0ee8 c03e0ee8 000f4240 c2a31e2c c01a9ced
       c03e0ec0 00000000 00000000 00000282 c0ab9200 c2a31e5c 000f4240 c0c72860
Call Trace: [generic_make_request+1843/1940] [scsi_request_fn+409/760] [generic_unplug_device+32/40] [__wait_on_buffer+162/216] [bread+71/112] [isofs_find_entry+151/1420] [path_walk+1818/2028]
       [isofs_cmp+22/28] [isofs_lookup+40/128] [real_lookup+79/156] [path_walk+1442/2028] [open_namei+120/1452] [<f2f4e865>] [<e8c6dff5>] [filp_open+48/80]
       [sys_open+54/172] [system_call+52/64]
Code: c7 83 c4 00 00 00 ff ff 00 00 c7 83 f8 00 00 00 00 00 00 00

Attached devices:
Host: scsi1 Channel: 00 Id: 06 Lun: 00
  Vendor: NRC Model: MBR-7 Rev: 110
  Type: CD-ROM ANSI SCSI revision: 02
[7 disk changer, LUN 0-6, rest deleted for brevity]

I think this is a long standing bug in something or other.
I think we keep chipping pieces away, and the symptoms keep changing.
Originally, this was a fatal panic caused by a bug in the AHA1542 reset code.
Someone changed something between test-1 and test-5, and now instead
of causing a scsi reset, it causes the above.

Here's how I [believe] this is triggered...

I was flipping between two mp3's with xmms that happened to be on different
cdroms in my cd changer. I think the start of both files ended up in
cache, so I was playing the cached part of the mp3 on disk 5 while disk 3 was
loaded. When it ran out of cached data, it had to suddenly change disks.

The disk change process is lengthy--between 3 and 6 seconds plus spin
down and spin up time. Rapid changes between disks or any other cdrom
related activity that doesn't happen immediately seems to reliably cause
problems. (I suspect scanning the sectors of a disk in reverse order
(requiring frequent lengthy seeks) also would cause this, but I've yet
to test it.)

In previous kernels, I would get scsi bus resets until the offending
process was killed, or the cdrom was removed from the drive, causing
an I/O error. Subsequent attempts to access the device in any way puts
the process in uninterruptable disk wait. Other devices (i.e., other
slots in the same cd changer) work just fine.

In this kernel, instead of bus resets, I got the OOPS above.
This OOPS (unlike previous ones) doesn't seem to be AHA1542 related.
(Maybe all the bugs in that driver leading up to this have finally
been smoothed over enough to find the underlying bug.)

Can someone make sense of this one?

        Steve
ssd@nevets.oau.org

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Tue Aug 15 2000 - 21:00:20 EST