[Fwd: SMP/Zip/SCSI Lockups]

Ian Smith (ian@beaky.force9.co.uk)
Tue, 04 May 1999 00:13:34 +0100


This is a multi-part message in MIME format.
--------------A3B702F21DB477AA0E220026
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

Ian Smith wrote:
>
> Eric,
>
> I hope you are the right person to tell about this, I got your address
> from drivers/scsi/scsi.h.
>
> Since adding a second Pentium II processor, I have experienced several
> complete system lockups while backing up my data to an external Zip
> drive. After a limited amount of testing (I only have one SCSI drive
> and one computer!) I believe SMP is the problem as I have not been able
> to reproduce it with a non-SMP kernel (both at 2.2.3 for these tests).
>
> Also, backups fail with CRC errors about 50% of the time with SMP.
>
> The lockups are typically (always?) triggered by mouse clicking in
> Netscape while a backup is in progress. Because I value my only
> machine, I looked at the console during backup and saw the following
> error(s) immediately prior to backup failure (with Netscape running but
> _not_ being used!).
>
> end_scsi_request: buffer-list destroyed
>
> Again, without SMP I do not get these messages, and my backups do not
> fail. My big assumption here is that the lockups are related to this
> error message.
>
> If you or anyone else can think of some non-destructive tests I would be
> happy to provide more data.
>
> Thanks,
>
> --
> ----------------------------------------------------------------------------
> Ian Smith
> ----------------------------------------------------------------------------

It's just happened again (kernel 2.2.7), locked solid whilst running in
a VT. I enclose a file I transcribed from the VT after the lockup. I
tried to use ksymoops on it but it wouldn't make :(

Unable to handle kernel paging request at virtual address 000044a4
current->tss.cr3=000101000, %cr3=00101000
*pde=00000000
Oops: 0000
CPU: 1
EIP: 0010:[<c01ab735>]
EFLAGS: 00010202
eax: 00000004 ebx: 0000037a ecx: 00000080 edx: 0000037C
esi: 000044a4 edi: 000044a4 ebp: c0210379 esp: c0397efc
ds: 0018 es: 0018 ss:0018
Process swapper (pid:0, process nr:1, stackpage=c0397000)
Stack: 00000000 00000000 006164c0 00000378 00000040 00000200 c01abce3
00000000
000044a4 00000200 c021519c c0084800 c021519c 00000000 00000001
0157fb46
c01b0378 c01ac2c5 c0084800 c021519c c0084800 c021519c c0397f8c
c010aa65
Call Trace: [<c01abce37>] [<co1b0378>] [<c01ac2c5>] [<c010aa65>]
[<c01abeab>] [<c0112852>]
[<c0119b65>] [<c0106000>] [<c010abbd>] [<c0108c9c>] [<c0106000>]
[<c0107347>] [<c01075b7>]
Code: f3 6f eb 11 8d 76 00 66 8b 54 24 14 66 83 c2 04 89 fe fc f3
Aiee, killing interrupt handler
Kernel panic: attempted to kill the idle task!
In interrupt handler - not syncing
_

This is what dmesg reports for SMP:

ian 2 $ dmesg
Linux version 2.2.7 (root@beaky) (gcc version 2.7.2.3) #1 SMP Thu Apr 29
19:28:07 BST 1999
Intel MultiProcessor Specification v1.1
Virtual Wire compatibility mode.
OEM ID: OEM00000 Product ID: PROD00000000 APIC at: 0xFEE00000
Processor #0 Pentium(tm) Pro APIC version 17
Processor #1 Pentium(tm) Pro APIC version 17
I/O APIC #2 Version 17 at 0xFEC00000.
Processors: 2
mapped APIC to ffffe000 (fee00000)
mapped IOAPIC to ffffd000 (fec00000)
Detected 233140909 Hz processor.
Console: colour VGA+ 132x44
Calibrating delay loop... 232.65 BogoMIPS
Memory: 128036k/131072k available (996k kernel code, 416k reserved,
1576k data, 48k init)
Checking 386/387 coupling... OK, FPU using exception 16 error reporting.
Checking 'hlt' instruction... OK.
POSIX conformance testing by UNIFIX
mtrr: v1.26 (19981001) Richard Gooch (rgooch@atnf.csiro.au)
per-CPU timeslice cutoff: 100.06 usecs.
CPU0: Intel Pentium II (Klamath) stepping 03
calibrating APIC timer ...
..... CPU clock speed is 233.1471 MHz.
..... system bus clock speed is 66.6132 MHz.
Booting processor 1 eip 2000
Calibrating delay loop... 232.65 BogoMIPS
OK.
CPU1: Intel Pentium II (Klamath) stepping 04
Total of 2 processors activated (465.31 BogoMIPS).
enabling symmetric IO mode... ...done.
ENABLING IO-APIC IRQs
init IO_APIC IRQs
IO-APIC pin 0, 16, 20, 21, 22, 23 not connected.
number of MP IRQ sources: 22.
number of IO-APIC registers: 24.
testing the IO APIC.......................
.... register #00: 02000000
....... : physical APIC id: 02
.... register #01: 00170011
....... : max redirection entries: 0017
....... : IO APIC version: 0011
.... register #02: 00000000
....... : arbitration: 00
.... IRQ redirection table:
NR Log Phy Mask Trig IRR Pol Stat Dest Deli Vect:
00 000 00 1 0 0 0 0 0 0 00
01 000 00 0 0 0 0 0 1 1 59
02 0FF 0F 0 0 0 0 0 1 1 51
03 000 00 0 0 0 0 0 1 1 61
04 000 00 0 0 0 0 0 1 1 69
05 000 00 0 0 0 0 0 1 1 71
06 000 00 0 0 0 0 0 1 1 79
07 000 00 0 0 0 0 0 1 1 81
08 000 00 0 0 0 0 0 1 1 89
09 000 00 0 0 0 0 0 1 1 91
0a 000 00 0 0 0 0 0 1 1 99
0b 000 00 0 0 0 0 0 1 1 A1
0c 000 00 0 0 0 0 0 1 1 A9
0d 000 00 1 0 0 0 0 0 0 00
0e 000 00 0 0 0 0 0 1 1 B1
0f 000 00 0 0 0 0 0 1 1 B9
10 000 00 1 0 0 0 0 0 0 00
11 0FF 0F 1 1 0 1 0 1 1 C1
12 0FF 0F 1 1 0 1 0 1 1 C9
13 0FF 0F 1 1 0 1 0 1 1 D1
14 000 00 1 0 0 0 0 0 0 00
15 000 00 1 0 0 0 0 0 0 00
16 000 00 1 0 0 0 0 0 0 00
17 000 00 1 0 0 0 0 0 0 00
IRQ to pin mappings:
IRQ0 -> 2
IRQ1 -> 1
IRQ3 -> 3
IRQ4 -> 4
IRQ5 -> 5
IRQ6 -> 6
IRQ7 -> 7
IRQ8 -> 8
IRQ9 -> 9
IRQ10 -> 10
IRQ11 -> 11
IRQ12 -> 12
IRQ13 -> 13
IRQ14 -> 14
IRQ15 -> 15
IRQ17 -> 17
IRQ18 -> 18
IRQ19 -> 19
.................................... done.
mtrr: your CPUs had inconsistent variable MTRR settings
mtrr: probably your BIOS does not setup all CPUs

Hope you can help, as my only solution is to stop doing backups!
Regards,

-- 
----------------------------------------------------------------------------
Ian Smith
----------------------------------------------------------------------------
--------------A3B702F21DB477AA0E220026
Content-Type: text/plain; charset=us-ascii;
 name="the_oops.txt"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
 filename="the_oops.txt"

Unable to handle kernel paging request at virtual address 000044a4 current->tss.cr3=000101000, %cr3=00101000 *pde=00000000 Oops: 0000 CPU: 1 EIP: 0010:[<c01ab735>] EFLAGS: 00010202 eax: 00000004 ebx: 0000037a ecx: 00000080 edx: 0000037C esi: 000044a4 edi: 000044a4 ebp: c0210379 esp: c0397efc ds: 0018 es: 0018 ss:0018 Process swapper (pid:0, process nr:1, stackpage=c0397000) Stack: 00000000 00000000 006164c0 00000378 00000040 00000200 c01abce3 00000000 000044a4 00000200 c021519c c0084800 c021519c 00000000 00000001 0157fb46 c01b0378 c01ac2c5 c0084800 c021519c c0084800 c021519c c0397f8c c010aa65 Call Trace: [<c01abce37>] [<co1b0378>] [<c01ac2c5>] [<c010aa65>] [<c01abeab>] [<c0112852>] [<c0119b65>] [<c0106000>] [<c010abbd>] [<c0108c9c>] [<c0106000>] [<c0107347>] [<c01075b7>] Code: f3 6f eb 11 8d 76 00 66 8b 54 24 14 66 83 c2 04 89 fe fc f3 Aiee, killing interrupt handler Kernel panic: attempted to kill the idle task! In interrupt handler - not syncing

--------------A3B702F21DB477AA0E220026--

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/