Re: Who owns qlogicisp?

From: Peter Rival (frival@zk3.dec.com)
Date: Fri Jun 16 2000 - 15:16:17 EST


Tigran Aivazian wrote:

> the file Documentation/BUG-HUNTING tells you what to do. Roughly, you need
> to do this:
>

*mumble* Knew sending an email like that without enough caffeine would
make it
sound, erm, less intelligent than it should have... *sigh*

>
> a) why try to narrow down a single person when you can send your
> message to a dozen (or a hundred - the more the merrier) of approximately
> appropriate people?
>

Point taken. I was just trying to conserve bandwidth...

>
> b) post your oops, thoughts, analysis, data to linux-kernel list and
> someone maybe (just maybe) will be interested enough (and have time) to
> fix your problem.
>

This is repeatable with 2.4.0test1-ac19. The oops occurs when running
the
AIM VII File Server (fserver) workload. I know people don't like AIM as
a
benchmark, but all the stuff it does is completely valid (at the very
least it
shouldn't oops). What is interesting is that it survives the Shared
System
(shared) workload, which is lighter on the I/O than fserver, so this
appears to
be a load-related issue.

Oops:

Tasks jobs/min jti j/m/task real cpu
    1 10.42 100 10.4161 581.79 10.00 Tue Jun 16 18:20:27
2048
Tasks jobs/min jti j/m/task real cpu
   50Unable to handle kernel paging request at virtual address
0000000000000210

CPU 3 swapper(0): Oops 1
pc = [<fffffc0000946850>] ra = [<fffffc0000946850>] ps = 0007
v0 = 0000000000000000 t0 = 0000000000000000 t1 = ffffffffffec3960
t2 = fffffc00fffc6000 t3 = fffffc0000af6df8 t4 = 0000000000000000
t5 = fffffc0000a57a74 t6 = fffffc0000a55bc0 t7 = fffffc00fffe4000
s0 = 0000000000000000 s1 = 0000000000000002 s2 = fffffc0000af6c00
s3 = fffffc0000af6ce8 s4 = 0000000000000003 s5 = 000000000000000c
s6 = fffffc00fffe7c08
a0 = fffffc00fffc6040 a1 = fffffc0000af6c00 a2 = fffffc00fffe7c08
a3 = 0000000000000008 a4 = 0000000000002000 a5 = 000000000000000a
t8 = 0080000010000000 t9 = 0000101000000100 t10= 0000000001000000
t11= 0080000000000000 pv = fffffc000081de80 at = fffffd01a0000000
gp = fffffc0000a830c0 sp = fffffc00fffe7b08
Code: b28901c0 stl a4,448(s0)
 b26901c4 stl a3,452(s0)
 28300000 ldbu t0,0(a0)
 402075a1 cmpeq t0,3,t0
 e4200005 blt t0,.+24
 d3400046 bsr ra,.+284
*b0090210 stl v0,528(s0)
 c3e00004 br .+20
Trace:<4>qlogicisp.c:973 spinlock stuck in swapper at
fffffc00009464b4(2) owner
swapper at fffffc00009464b4(3) qlogicisp.c:973
9464cc 816854 8179f0 820824 <4>qlogicisp.c:973 spinlock stuck in swapper
at
fffffc00009464b4(0) owner swapper at fffffc00009464b4(3) qlogicisp.c:973
qlogicisp.c:973 spinlock stuck in swapper at fffffc00009464b4(1) owner
swapper
at fffffc00009464b4(3) qlogicisp.c:973
818234 810cb8 8128a0 821140 845918 812880 810044
810ec8 9bde60
Aiee, killing interrupt handler
Kernel panic: Attempted to kill the idle task!
In interrupt handler - not syncing

ksymoops output:

[root@schooner /root]# ksymoops -m ~frival/fin/linux/System.map < oops
ksymoops 2.3.3 on alpha 2.4.0-test1-ac18. Options used
     -V (default)
     -k /proc/ksyms (default)
     -l /proc/modules (default)
     -o /lib/modules/2.4.0-test1-ac18/ (default)
     -m /home/frival/fin/linux/System.map (specified)
No modules in ksyms, skipping objects
Warning (read_lsmod): no symbols in lsmod, is /proc/modules a valid
lsmod file?

Unable to handle kernel paging request at virtual address
0000000000000210
CPU 3 swapper(0): Oops 1
pc = [<fffffc0000946850>] ra = [<fffffc0000946850>] ps = 0007
Using defaults from ksymoops -t elf64-alpha -a alpha
v0 = 0000000000000000 t0 = 0000000000000000 t1 = ffffffffffec3960
t2 = fffffc00fffc6000 t3 = fffffc0000af6df8 t4 = 0000000000000000
t5 = fffffc0000a57a74 t6 = fffffc0000a55bc0 t7 = fffffc00fffe4000
s0 = 0000000000000000 s1 = 0000000000000002 s2 = fffffc0000af6c00
s3 = fffffc0000af6ce8 s4 = 0000000000000003 s5 = 000000000000000c
s6 = fffffc00fffe7c08
a0 = fffffc00fffc6040 a1 = fffffc0000af6c00 a2 = fffffc00fffe7c08
a3 = 0000000000000008 a4 = 0000000000002000 a5 = 000000000000000a
t8 = 0080000010000000 t9 = 0000101000000100 t10= 0000000001000000
t11= 0080000000000000 pv = fffffc000081de80 at = fffffd01a0000000
gp = fffffc0000a830c0 sp = fffffc00fffe7b08
Code: b28901c0 stl a4,448(s0)
Warning (Oops_read): Code line not seen, dumping what data is available
>>PC; fffffc0000946850 <isp1020_intr_handler+330/440> <=====
Trace: fffffc00009464cc fffffc0000816854 fffffc00008179f0
fffffc0000820824
fffffc0000818234 fffffc0000810cb8 fffffc00008128a0 fffffc0000821140
fffffc0000845918 fffffc0000812880 fffffc0000810044
Aiee, killing interrupt handler
Kernel panic: Attempted to kill the idle task!
Warning (Oops_read): Code line not seen, dumping what data is available

Trace; fffffc00009464cc <do_isp1020_intr_handler+6c/c0>
Trace; fffffc0000816854 <handle_IRQ_event+d4/160>
Trace; fffffc00008179f0 <handle_irq+170/200>
Trace; fffffc0000820824 <clipper_srm_device_interrupt+24/40>
Trace; fffffc0000818234 <do_entInt+114/1a0>
Trace; fffffc0000810cb8 <ret_from_sys_call+0/24>
Trace; fffffc00008128a0 <cpu_idle+40/60>
Trace; fffffc0000821140 <do_check_pgt_cache+0/260>
Trace; fffffc0000845918 <generic_file_write+3f8/8c0>
Trace; fffffc0000812880 <cpu_idle+20/60>
Trace; fffffc0000810044 <__smp_callin+24/28>
3 warnings issued. Results may not be reliable.

(And anyone know what the "Code line not seen" bit is all about?)

General thought so far:

    It appears that somehow the scsi command pointed to by Cmnd (as set
in
qlogicisp.c:1031) is NULL at some point in the loop, and we don't ever
check
for that occurrence. My question is, is that a valid state for the
command
slot, and if so, shouldn't we just continue on to the next one?
Essentially,
what we have is this:

while (out_ptr != in_ptr) {
    u_int cmd_slot;

    sts = (struct Status_Entry *) &hostdata->res_cpu[out_ptr];
    out_ptr = (out_ptr + 1) & RES_QUEUE_LEN);

    cmd_slot = sts_handle;
    Cmnd = hostdata->cmd_slots[cmd_slot];
<should we check that Cmnd is not null here, and if it is just
continue?>
    hostdata->cmd_slots[cmd_slot] = NULL;

    <...>
    if (sts->hdr.entry_type == ENTRY_STATUS)
        Cmnd->result = isp1020_return_status(sts); <== error occurs
here
    else
        Cmnd->result = DID_ERROR << 16;
    <...>

So, anyone see anything they like? :)

 - Pete

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Fri Jun 23 2000 - 21:00:12 EST