Re: oops after scsi reset (ncr53c8xx)

Gerard Roudier (groudier@club-internet.fr)
Mon, 4 Jan 1999 22:02:33 +0100 (MET)


On Sun, 3 Jan 1999, Manfred Spraul wrote:

> linux crashes on my machine (Pentium 100, 64Mb) if I load "ncr53c8xx" in
> "rc.sysinit".
>
> 1) The crash only occurs if I attach an old CD-Writer:
> the device doesn't understand the LUN's, and the
> driver forces an scsi reset.

Did you give a try with MULTI_LUN option set to N?

> 2) the crash doesn't occur if I load the driver outside rc.sysinit
> 3) if I add "sleep 10" arround the "insmod", then
> the computer doesn't crash.

What happens if you only load the driver module in rc.sysinit ?

> As far as I understand the Oops, the oops is caused during the completion of
> the scsi reset.
>
> The exact cause is: lp seems to be 0x88080000

Yes, it is not a valid kernel virtual address. Should look like
0xcxxxxxxx if it is valid.

> in ncr53c8xx, function ncr_complete(), line 5747:
> if (lp && lp->held_ccb)
> {
> }

The dereference of 'lp' invalid pointer can only fault.

> I don't understand yet how& why ncr_complete() was called.

Perhaps, 'what called the ncr_complete() function?' would be the right
question to ask.

[ ... ]

> My hostadapter:
> 53c810a
> revision: 0x12
> port address 0x7c00
> I had to switch to Port mapped Io, memory mapped Io failed completely.

BTW, I successfully use the same chip revision with MMIO.

> ---------
> <alt>-<SysRq> still works, and if I press <Alt><SysRq><p> sereval times, I
> get
> EIP=10:c0111080 or ..87, all other registers are constant.
> /boot/System.map: c0110fc8 T panic.
> ------------ rc.sysinit:
> [...]
> insmod ncr53c8xx
> insmod fat
> insmod vfat
> insmod hfs
>
> # checking filesystems
> [...]
> echo "Mounting local filesystems"
> mount -a -t nonfs,proc
> ------------------- output on screen & ksymoops
> scsi: 1 host
> [further output from scsi & ncr]
>
> Mounting local filesystems
> Unable to kandle kernel paging request at virtual address 88080084
> -- ksymoops
> Options used: -V -o /usr/src/linux/modules -k /proc/ksyms -l
> /proc/modules -m /boot/System.map-2.2.0
>
> current->tss.cr3=00101000, %cr3= 00101000
> Oops: 0000
> CPU: 0
> EIP: 0010:[<c481034d>]
> EFLAGS: 00010086
> eax: 00000000 ebx: c4ca0080 ecx: c3ca0ac0 edx: c3ca0ac0
> esi: c3ca0080 edi: c000ca00 ebp: 88080000 esp: c01d7eb0

Stack depth is 0x1eb0 if 8K (kernel 2.1), and 0xeb0 if 4K (kernel 2.0).
That seems so low, that, may-be, a kernel stack overflow may just have
occurred. This should be inverstigated, in my opinion.

> ds: 0018 es: 0018 ss: 0018
> c3ca0ac0
> c4810400 c4811a49 c3ca0080 c3ca0080 00000046 0000000b c01d7f68
> c005860
> 0000000e c01fd100 c009d202 c4813df6 c3ca0080 c090780 23000001
> 0000000b
> Call Trace: [<c0169082>] [<c4811bf1>] [<c4810400>] [<c4811a49>] [<c4814df6>]
> [<c010899d>] [<c0108ab7>] [<c0108b9f>] [<c0107acb>] [<c010626d>]
> [<c0106000>] [<c0106290>] [<c0107a20>] [<c0106000>] [<c0100176>]
> code: 8b 85 84 00 00 00 85 c0 74 38 39 44 24 24 75 32 8d 5d 40 8d
>
> >>EIP: c481034d <ncr_complete+bd/530>
> Trace: c0169082 <ide_end_request+5e/68>

- ncr_int_sto() should have called ncr_complete().
- ncr_int_sto() should have been called by ncr_exception().

> Trace: c4811bf1 <ncr_int_sto+55/74>
> Trace: c4810400 <ncr_complete+170/530>
> Trace: c4811a49 <ncr_exception+235/388>
> Trace: c4814df6 <ncr53c8xx_intr+26/74>
> Trace: c010899d <handle_IRQ_event+31/68>
> Trace: c0108ab7 <do_8259A_IRQ+77/a0>
> Trace: c0108b9f <do_IRQ+23/40>
> Trace: c0107acb <ret_from_intr+f/20>
> Trace: c010626d <cpu_idle+5d/6c>
> Trace: c0106000 <get_options+0/74>
> Trace: c0106290 <sys_idle+14/24>
> Trace: c0107a20 <system_call+34/38>
> Trace: c0106000 <get_options+0/74>
> Trace: c0100176 <L6+0/2>
> Code: c481034d <ncr_complete+bd/530> 00000000 <_EIP>:
> Code: c481034d <ncr_complete+bd/530> 0: 8b 85 84 00 00 movl
> 0x84(%ebp),%eax
> Code: c4810352 <ncr_complete+c2/530> 5: 00
> Code: c4810353 <ncr_complete+c3/530> 6: 85 c0 testl
> %eax,%eax
> Code: c4810355 <ncr_complete+c5/530> 8: 74 38 je
> 42 <_EIP+0x42> c481038f <ncr_complete+ff/530>
> Code: c4810357 <ncr_complete+c7/530> a: 39 44 24 24 cmpl
> %eax,0x24(%esp,1)
> Code: c481035b <ncr_complete+cb/530> e: 75 32 jne
> 42 <_EIP+0x42> c481038f <ncr_complete+ff/530>
> Code: c481035d <ncr_complete+cd/530> 10: 8d 5d 40 leal
> 0x40(%ebp),%ebx
> Code: c4810360 <ncr_complete+d0/530> 13: 8d 00 leal
> (%eax),%eax
>
> Aiee, killing interrupt handler
>
> Aiee, killing interrupt handler
> Kernel panic: Attempted to kill the idle task!
> in swapper task - not syncing

I donnot have any better idea for the moment.

Regards,
Gerard.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/