PROBLEM: SMP, SCSI and test11,12 cause repeatable oops

From: Aron Rosenberg (amr42@cornell.edu)
Date: Tue Dec 12 2000 - 12:22:23 EST


Hello All, this is my first bug report so bear with me (I'm trying to
follow the directions.) This was sent to linux-smp, linux-scsi, but no answer

Alright, I hope this helps everybody!

Aron Rosenberg
amr42@cornell.edu
Video Conferencing for Linux
http://cu30.sourceforge.net

<begin Bug report>
[1] SMP machine keeps oops'in and crashing on heavy/any SCSI disk access.
This can be reproduced on low, medium and high CPU usage. File system is ext2.

[2] The kernel will oops and panic when in a shell a user tries to do a
large file operation on a SCSI disk in an SMP kernel. The best and easiest
way is to try and unpack the kernel sources. The machine will oops a little
bit into it. This is easily repeatable on both test11, test11-pre8 and
test12 in SMP mode. Does not happen in UP mode.

[3] Keywords: smp kernel

[4] version 2.4.0test11,test11-pre8, test12

[5] oops message and stack trace derefernced

Unable to handle Kernel NULL pointer dereference at virtual address 0000003d
*pde = 00000000
Eip: 0010:[<c01331bc>]
Using defaults from ksymoops -t elf32-i386 -a i386
Eflags: 00010007
eax: 000000001 ebx: c122d628 ecx:00000046 eax: 000000001
esi: c832fc20 eti: 00000202 ebp: c832fc68 esp: c166de50
ds: 0018 es: 0018 ss: 0018
Process swapper (pid: 0, stackpage=c166d000)
Stack: c832fc20 d7eb20ac d7eb2000 00000050 c019a52d c832fc20 00000001 d7eb2000
  d7eb2000 00000000 c02db174 c019a829 d7eb2000 00000001 000000f8 00000001
  00000001 00000001 d7eb2000 000000f8 c166df08 c02db178 c02db17c 0000001f
Call Trace: [<c019a52d>] [<c019a829>] [<c01b8b9c>] [<c0199b1a>] [<c01abae0>]
  [<c01a0524>] [<c01abda1>]
  [<c010be61>] [<c010c056>] [<c0108934>] [<c0108934>] [<c010a7d8>]
[<c018934>] [<c0108934>] [<c0100018>]
  [<c0108960>] [<c01089c2>] [<c01c6ea5>] [<c0171572>]
Code: 81 78 3c 48 31 13 c0 75 06 f6 40 18 04 75 5d 8b 40 28 39 f0

>>EIP; c01331bc <end_buffer_io_async+74/f4> <=====
Trace; c019a52d <__scsi_end_request+99/144>
Trace; c019a829 <scsi_io_completion+169/34c>
Trace; c01b8b9c <rw_intr+154/15c>
Trace; c0199b1a <scsi_old_done+5a6/5c4>
Trace; c01abae0 <aic7xxx_isr+d8/310>
Trace; c01a0524 <aic7xxx_done_cmds_complete+28/38>
Trace; c01abda1 <do_aic7xxx_isr+89/ac>
Trace; c010be61 <handle_IRQ_event+51/7c>
Trace; c010c056 <do_IRQ+9a/ec>
Trace; c0108934 <default_idle+0/34>
Trace; c0108934 <default_idle+0/34>
Trace; c010a7d8 <ret_from_intr+0/20>
Trace; 0c018934 Before first symbol
Trace; c0108934 <default_idle+0/34>
Trace; c0100018 <startup_32+18/cc>
Trace; c0108960 <default_idle+2c/34>
Trace; c01089c2 <cpu_idle+3a/50>
Trace; c01c6ea5 <vgacon_cursor+1e9/1f4>
Trace; c0171572 <set_cursor+6e/84>
Code; c01331bc <end_buffer_io_async+74/f4>
0000000000000000 <_EIP>:
Code; c01331bc <end_buffer_io_async+74/f4> <=====
    0: 81 78 3c 48 31 13 c0 cmpl $0xc0133148,0x3c(%eax) <=====
Code; c01331c3 <end_buffer_io_async+7b/f4>
    7: 75 06 jne f <_EIP+0xf> c01331cb
<end_buffer_io_async+83/f4>
Code; c01331c5 <end_buffer_io_async+7d/f4>
    9: f6 40 18 04 testb $0x4,0x18(%eax)
Code; c01331c9 <end_buffer_io_async+81/f4>
    d: 75 5d jne 6c <_EIP+0x6c> c0133228
<end_buffer_io_async+e0/f4>
Code; c01331cb <end_buffer_io_async+83/f4>
    f: 8b 40 28 mov 0x28(%eax),%eax
Code; c01331ce <end_buffer_io_async+86/f4>
   12: 39 f0 cmp %esi,%eax

Aiee, killing interrupt handler
Kernel panic: Attempted to kill the idle task!

1 warning issued. Results may not be reliable.
-------------------------------------------------------------
I had to copy down by hand, so I hope everything is correct.

Repeat of the oops on no cpu use.

Unable to handle Kernel NULL pointer dereference at virtual address 0000003d
*pde = 00000000
Eip: 0010:[<c01331bc>]
Using defaults from ksymoops -t elf32-i386 -a i386
Eflags: 00010007
eax: 000000001 ebx: c12aefe4 ecx:00000046 eax: 000000001
esi: ca1abd40 eti: 00000202 ebp: ca1abd88 esp: c166de50
ds: 0018 es: 0018 ss: 0018
Process swapper (pid: 0, stackpage=c166d000)
Stack: c832fc20 d7eb20ac d7eb2000 00000050 c019a52d c832fc20 00000001 d7eb2000
  d7eb2000 00000000 c02db174 c019a829 d7eb2000 00000001 000000f8 00000001
  00000001 00000001 d7eb2000 000000f8 c166df08 c02db178 c02db17c 0000001f
Call Trace: [<c019a52d>] [<c019a829>] [<c01b8b9c>] [<c0199b1a>] [<c01abae0>]
  [<c01a0524>] [<c01abda1>]
  [<c010be61>] [<c010c056>] [<c0108934>] [<c0108934>] [<c010a7d8>]
[<c018934>] [<c0108934>] [
<c0100018>]
  [<c0108960>] [<c01089c2>] [<c01c6ea5>] [<c0171572>]
Code: 81 78 3c 48 31 13 c0 75 06 f6 40 18 04 75 5d 8b 40 28 39 f0

>>EIP; c01331bc <end_buffer_io_async+74/f4> <=====
Trace; c019a52d <__scsi_end_request+99/144>
Trace; c019a829 <scsi_io_completion+169/34c>
Trace; c01b8b9c <rw_intr+154/15c>
Trace; c0199b1a <scsi_old_done+5a6/5c4>
Trace; c01abae0 <aic7xxx_isr+d8/310>
Trace; c01a0524 <aic7xxx_done_cmds_complete+28/38>
Trace; c01abda1 <do_aic7xxx_isr+89/ac>
Trace; c010be61 <handle_IRQ_event+51/7c>
Trace; c010c056 <do_IRQ+9a/ec>
Trace; c0108934 <default_idle+0/34>
Trace; c0108934 <default_idle+0/34>
Trace; c010a7d8 <ret_from_intr+0/20>
Trace; 0c018934 Before first symbol
Trace; c0108934 <default_idle+0/34>
Trace; c0100018 <startup_32+18/cc>
Trace; c0108960 <default_idle+2c/34>
Trace; c01089c2 <cpu_idle+3a/50>
Trace; c01c6ea5 <vgacon_cursor+1e9/1f4>
Trace; c0171572 <set_cursor+6e/84>
Code; c01331bc <end_buffer_io_async+74/f4>
0000000000000000 <_EIP>:
Code; c01331bc <end_buffer_io_async+74/f4> <=====
    0: 81 78 3c 48 31 13 c0 cmpl $0xc0133148,0x3c(%eax) <=====
Code; c01331c3 <end_buffer_io_async+7b/f4>
    7: 75 06 jne f <_EIP+0xf> c01331cb
<end_buffer_io_async+83/f4>
Code; c01331c5 <end_buffer_io_async+7d/f4>
    9: f6 40 18 04 testb $0x4,0x18(%eax)
Code; c01331c9 <end_buffer_io_async+81/f4>
    d: 75 5d jne 6c <_EIP+0x6c> c0133228
<end_buffer_io_async+e0/f4>
Code; c01331cb <end_buffer_io_async+83/f4>
    f: 8b 40 28 mov 0x28(%eax),%eax
Code; c01331ce <end_buffer_io_async+86/f4>
   12: 39 f0 cmp %esi,%eax

Aiee, killing interrupt handler
Kernel panic: Attempted to kill the idle task!

1 warning issued. Results may not be reliable.

[6] How to produce
This all happen on heavy disk usage of the SCSI system. If I try to compile
the test11 kernel on a test11 boot with make -j3 or -j4 things will
segfault out and once it oops'ed. This particular oops happened when
untarring/verify a 358 meg file on the SCSI drive. There is also low to
medium processor usage happening. I can reproduce this on an idle machine
too. The file is a 358 meg script file/tar ball which causes the kernel to
oops by running cksum or md5sum verify. Also will panic on untarring a
kernel source.

[7.0] Machine type
Dell Precision Workstation 610 MT
Bios: A09
Processors: 2 Pentium II Xeon 400's
Memory 396megs
Disks: 1 IDE 2gig main boot, 1 SCSI-2 attached host 9gig.

[7.1] Linux amr42a 2.4.0-test11 #1 SMP Wed Dec 6 19:33:04 EST 2000 i686 unknown
Kernel modules 2.3.11
Gnu C egcs-2.91.66
Gnu Make 3.77
Binutils 2.10.0.24
Linux C Library 2.1.3
Dynamic linker ldd: version 1.9.9
Procps 2.0.7
Mount 2.9v
Net-tools 1.55
Kbd command
Sh-utils 2.0
Modules Loaded

[7.2] /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 5
model name : Pentium II (Deschutes)
stepping : 2
cpu MHz : 398.000780
cache size : 512 KB
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 2
wp : yes
features : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
cmov pat pse36 mmx fxsr
bogomips : 796.26

processor : 1
vendor_id : GenuineIntel
cpu family : 6
model : 5
model name : Pentium II (Deschutes)
stepping : 2
cpu MHz : 398.000780
cache size : 512 KB
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 2
wp : yes
features : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
cmov pat pse36 mmx fxsr
bogomips : 796.26

[7.3] No modules loaded.
[7.4] /proc/ioports
0000-001f : dma1
0020-003f : pic1
0040-005f : timer
0060-006f : keyboard
0080-008f : dma page reg
00a0-00bf : pic2
00c0-00df : dma2
00f0-00ff : fpu
01f0-01f7 : ide0
02f8-02ff : serial(auto)
03c0-03df : vga+
03f6-03f6 : ide0
03f8-03ff : serial(auto)
0800-083f : Intel Corporation 82371AB PIIX4 ACPI
0840-085f : Intel Corporation 82371AB PIIX4 ACPI
0cf8-0cff : PCI conf1
dc00-dc7f : 3Com Corporation 3c905B 100BaseTX [Cyclone]
   dc00-dc7f : eth0
dce0-dcff : Intel Corporation 82371AB PIIX4 USB
e000-efff : PCI Bus #02
   e800-e8ff : Adaptec AIC-7880U
     e800-e8fe : aic7xxx
   ec00-ecff : Adaptec AHA-2940U2/W / 7890
     ec00-ecfe : aic7xxx
ffa0-ffaf : Intel Corporation 82371AB PIIX4 IDE
   ffa0-ffa7 : ide0

/proc/iomem
00000000-0009ffff : System RAM
000a0000-000bffff : Video RAM area
000c0000-000c7fff : Video ROM
000c8000-000cc7ff : Extension ROM
000cc800-000ccfff : Extension ROM
000cd000-000cffff : Extension ROM
000f0000-000fffff : System ROM
00100000-17ffdfff : System RAM
   00100000-0025193f : Kernel code
   00251940-0026777f : Kernel data
17ffe000-17ffffff : reserved
f0000000-f3ffffff : Intel Corporation 440GX - 82443GX Host bridge
f5000000-f5ffffff : PCI Bus #02
f6000000-f6ffffff : PCI Bus #01
f9000000-faffffff : PCI Bus #02
   f9ffe000-f9ffefff : Adaptec AIC-7880U
   f9fff000-f9ffffff : Adaptec AHA-2940U2/W / 7890
fb000000-fdffffff : PCI Bus #01
   fb800000-fbffffff : Texas Instruments TVP4020 [Permedia 2]
   fc000000-fc7fffff : Texas Instruments TVP4020 [Permedia 2]
   fcfe0000-fcffffff : Texas Instruments TVP4020 [Permedia 2]
fe000000-fe00007f : 3Com Corporation 3c905B 100BaseTX [Cyclone]
fec00000-fec0ffff : reserved
fee00000-fee0ffff : reserved
ffe00000-ffffffff : reserved

[7.5]
00:00.0 Host bridge: Intel Corporation 440GX - 82443GX Host bridge
        Subsystem: Dell Computer Corporation: Unknown device 4087
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR+ FastB2B-
        Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort-
<MAbort+ >SERR- <PERR-
        Latency: 64
        Region 0: Memory at f0000000 (32-bit, prefetchable) [size=64M]
        Capabilities: [a0] AGP version 1.0
                Status: RQ=31 SBA+ 64bit- FW- Rate=x1,x2
                Command: RQ=0 SBA- AGP- 64bit- FW- Rate=<none>

00:01.0 PCI bridge: Intel Corporation 440GX - 82443GX AGP bridge (prog-if
00 [Normal decode])
        Control: I/O+ Mem+ BusMaster+ SpecCycle+ MemWINV+ VGASnoop- ParErr-
Stepping- SERR+ FastB2B-
        Status: Cap- 66Mhz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort-
<MAbort- >SERR- <PERR-
        Latency: 64
        Bus: primary=00, secondary=01, subordinate=01, sec-latency=64
        I/O behind bridge: 0000f000-00000fff
        Memory behind bridge: fb000000-fdffffff
        Prefetchable memory behind bridge: f6000000-f6ffffff
        BridgeCtl: Parity- SERR- NoISA+ VGA+ MAbort- >Reset- FastB2B+

00:07.0 ISA bridge: Intel Corporation 82371AB PIIX4 ISA (rev 02)
        Control: I/O+ Mem+ BusMaster+ SpecCycle+ MemWINV- VGASnoop- ParErr-
Stepping- SERR- FastB2B-
        Status: Cap- 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort-
<MAbort- >SERR- <PERR-
        Latency: 0

00:07.1 IDE interface: Intel Corporation 82371AB PIIX4 IDE (rev 01)
(prog-if 80 [Master])
        Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR- FastB2B-
        Status: Cap- 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort-
<MAbort- >SERR- <PERR-
        Latency: 64
        Region 4: I/O ports at ffa0 [size=16]

00:07.2 USB Controller: Intel Corporation 82371AB PIIX4 USB (rev 01)
(prog-if 00 [UHCI])
        Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR- FastB2B-
        Status: Cap- 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort-
<MAbort- >SERR- <PERR-
        Latency: 64
        Interrupt: pin D routed to IRQ 19
        Region 4: I/O ports at dce0 [size=32]

00:07.3 Bridge: Intel Corporation 82371AB PIIX4 ACPI (rev 02)
        Control: I/O+ Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR- FastB2B-
        Status: Cap- 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort-
<MAbort- >SERR- <PERR-

00:11.0 Ethernet controller: 3Com Corporation 3c905B 100BaseTX [Cyclone]
(rev 24)
        Subsystem: Dell Computer Corporation 3C905B Fast Etherlink XL 10/100
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr-
Stepping- SERR+ FastB2B-
        Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort-
<MAbort- >SERR- <PERR-
        Latency: 64 (2500ns min, 2500ns max), cache line size 08
        Interrupt: pin A routed to IRQ 17
        Region 0: I/O ports at dc00 [size=128]
        Region 1: Memory at fe000000 (32-bit, non-prefetchable) [size=128]
        Expansion ROM at f8000000 [disabled] [size=128K]
        Capabilities: [dc] Power Management version 1
                Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1+,D2+,D3hot-,D3cold-)
                Status: D0 PME-Enable- DSel=0 DScale=0 PME-

00:13.0 PCI bridge: Digital Equipment Corporation DECchip 21152 (rev 03)
(prog-if 00 [Normal decode])
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR+ FastB2B-
        Status: Cap+ 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort-
<MAbort- >SERR- <PERR-
        Latency: 64, cache line size 08
        Bus: primary=00, secondary=02, subordinate=02, sec-latency=64
        I/O behind bridge: 0000e000-0000efff
        Memory behind bridge: f9000000-faffffff
        Prefetchable memory behind bridge: 00000000f5000000-00000000f5f00000
        BridgeCtl: Parity- SERR+ NoISA+ VGA- MAbort- >Reset- FastB2B-
        Capabilities: [dc] Power Management version 1
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=220mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D0 PME-Enable- DSel=0 DScale=0 PME-
                Bridge: PM- B3+

01:00.0 VGA compatible controller: Texas Instruments TVP4020 [Permedia 2]
(rev 01) (prog-if 00 [VGA])
        Subsystem: Diamond Multimedia Systems FIRE GL 1000 PRO
        Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR- FastB2B-
        Status: Cap+ 66Mhz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort-
<MAbort- >SERR- <PERR-
        Interrupt: pin A routed to IRQ 16
        Region 0: Memory at fcfe0000 (32-bit, non-prefetchable) [size=128K]
        Region 1: Memory at fc000000 (32-bit, non-prefetchable) [size=8M]
        Region 2: Memory at fb800000 (32-bit, non-prefetchable) [size=8M]
        Expansion ROM at 80000000 [disabled] [size=64K]
        Capabilities: [40] AGP version 1.0
                Status: RQ=31 SBA+ 64bit- FW- Rate=x1
                Command: RQ=0 SBA- AGP- 64bit- FW- Rate=<none>

02:0a.0 SCSI storage controller: Adaptec AHA-2940U2/W / 7890
        Subsystem: Dell Computer Corporation: Unknown device 0087
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR- FastB2B-
        Status: Cap+ 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort-
<MAbort- >SERR- <PERR-
        Latency: 64 (9750ns min, 6250ns max), cache line size 08
        Interrupt: pin A routed to IRQ 18
        BIST result: 00
        Region 0: I/O ports at ec00 [size=256]
        Region 1: Memory at f9fff000 (64-bit, non-prefetchable) [size=4K]
        Expansion ROM at fa000000 [disabled] [size=128K]
        Capabilities: [dc] Power Management version 1
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D0 PME-Enable- DSel=0 DScale=0 PME-

02:0e.0 SCSI storage controller: Adaptec AIC-7880U (rev 01)
        Subsystem: Adaptec AIC-7880P Ultra/Ultra Wide SCSI Chipset
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR- FastB2B-
        Status: Cap+ 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort-
<MAbort- >SERR- <PERR-
        Latency: 64 (2000ns min, 2000ns max), cache line size 08
        Interrupt: pin A routed to IRQ 18
        Region 0: I/O ports at e800 [size=256]
        Region 1: Memory at f9ffe000 (32-bit, non-prefetchable) [size=4K]
        Expansion ROM at fa000000 [disabled] [size=64K]
        Capabilities: [dc] Power Management version 1
                Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D0 PME-Enable- DSel=0 DScale=0 PME-

[7.6] /proc/scsi/scsi
Attached devices:
Host: scsi0 Channel: 00 Id: 00 Lun: 00
   Vendor: QUANTUM Model: VIKING II 9.1WLS Rev: 3506
   Type: Direct-Access ANSI SCSI revision: 02
Host: scsi1 Channel: 00 Id: 05 Lun: 00
   Vendor: NEC Model: CD-ROM DRIVE:465 Rev: 1.03
   Type: CD-ROM ANSI SCSI revision: 02

[7.7] /proc/interupts
            CPU0 CPU1
   0: 40942 31267 IO-APIC-edge timer
   1: 2 0 IO-APIC-edge keyboard
   2: 0 0 XT-PIC cascade
  14: 130167 103226 IO-APIC-edge ide0
  17: 1626 1404 IO-APIC-level eth0
  18: 3210 3195 IO-APIC-level aic7xxx, aic7xxx
NMI: 72143 72143
LOC: 72125 72124
ERR: 0

<end bug report>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Fri Dec 15 2000 - 21:00:24 EST