Oops with 2.2.7 on Alpha

David Kastrup (dak@neuroinformatik.ruhr-uni-bochum.de)
Thu, 6 May 1999 19:40:19 +0200 (MET DST)


Hello,
first a word of caution: as I read the list only ocasionally, I'll be
quicker to answer any questions and response if you copy any reply of
this to the list to me. Thank you.

I want to thank you for the help you have provided with my Alpha
problem. After a lot of testing, I have been able to separate the
stability problems into two separate categories.

Category 1 is a hardware problem, pretty annoying. Basically, if I
dare to plug in any PCI device on the same PCI-Bus as the motherboard
SCSI-Controller (a NCR 53c875 (rev 3) chip) is sitting on, the system
will crash under heavy load regardless of the kernel in use. Since
this means that of our 6 PCI-Bus slots we will only be able to use the
single 64-Bit PCI-Slot, this is somewhat annoying.

But I have also been able to crash 2.2.7 applications under heavy
disk loads in a way that seems to be independent of this hardware
problem and that does not occur with the old 2.0.35 kernel.

Here is the relevant ksymoops info:

Options used: -V (default)
-o /lib/modules/2.2.7/ (default)
-k /proc/ksyms (default)
-l /proc/modules (default)
-m /usr/src/linux/System.map (default)
-c 1 (default)

You did not tell me where to find symbol information. I will assume
that the log matches the kernel and modules that are running right now
and I'll use the default options above for symbol resolution.
If the current kernel and/or modules do not match the log, you can get
more accurate output by telling me the kernel version and where to find
map, modules, ksyms etc. ksymoops -h explains the options.

No modules in ksyms, skipping objects
Unable to handle kernel paging request at virtual address 0018000000000040kswapd(4): Oops 0
pc = [<fffffc0000341084>] ra = [<fffffc0000330410>] ps = 0000
r0 = 0000000000000001 r1 = 0000000000000000 r2 = 0018000000000000
r3 = 000000000000ff00 r4 = 0000000000000b22 r5 = ffffffffffffffff
r6 = 0000000000001644 r7 = fffffc0000499020 r8 = fffffc000029c000
r9 = fffffc000053a010 r10= 0018000000000000 r11= 0018000000000000
r12= 0000000000000001 r13= fffffc000053a010 r14= fffffc0000310000
r15= 0000000000000000
r16= fffffc000053a010 r17= 0000000000000000 r18= fffffc0000002dc0
r19= 0000000000000001 r20= 0000001000000000 r21= 0000000000000000
r22= 0000000000000000 r23= 0000000000000001 r24= fffffc0000499008
r25= 0000000000000050 r27= fffffc0000341050 r28= 0000000000000000
gp = fffffc0000490228 sp = fffffc000029fdc0
Code: a56d0048 456b040a 454a0402 <a0220040> a54a0028 f4200003 a4220030 4428d001 e4200009

>>EIP: fffffc0000341084 <try_to_free_buffers+34/120>
Code: fffffc0000341078 <try_to_free_buffers+28/120> 0000000000000000 <_EIP>:
Code: fffffc0000341078 <try_to_free_buffers+28/120> 0: a5 6d 00 48 .long 0x48006da5
Code: fffffc000034107c <try_to_free_buffers+2c/120> 4: 45 6b 04 0a .long 0xa046b45
Code: fffffc0000341080 <try_to_free_buffers+30/120> 8: 45 4a 04 02 call_pal 0x2044a45
Code: fffffc0000341084 <try_to_free_buffers+34/120> c: a0 22 00 40 .long 0x400022a0 <===
Code: fffffc0000341088 <try_to_free_buffers+38/120> 10: a5 4a 00 28 ldbu v0,19109(v0)
Code: fffffc000034108c <try_to_free_buffers+3c/120> 14: f4 20 00 03 call_pal 0x30020f4
Code: fffffc0000341090 <try_to_free_buffers+40/120> 18: a4 22 00 30 ldwu v0,8868(v0)
Code: fffffc0000341094 <try_to_free_buffers+44/120> 1c: 44 28 d0 01 call_pal 0x1d02844
Code: fffffc0000341098 <try_to_free_buffers+48/120> 20: e4 20 00 09 .long 0x90020e4

Trace: [<fffffc0000330410>] [<fffffc0000338070>] [<fffffc0000338208>] [<fffffc0000310768>] [<fffffc0000338148>]

Trace: fffffc0000330410 <shrink_mmap+170/208>
Trace: fffffc0000338070 <do_try_to_free_pages+58/130>
Trace: fffffc0000338208 <kswapd+c0/168>
Trace: fffffc0000310768 <__kernel_thread+50/68>
Trace: fffffc0000338148 <kswapd+0/168>

3 warnings issued. Results may not be reliable.

Once the system has crashed in this way (high disk load (do a split of
200k strips on a Gigabyte file, for example) and parallel activity are
typically needed for that), it becomes increasingly easier to cause
this error to appear that will usually cause the application under
which it occurs to Segfault.

For the record, and for perhaps helping with a suggestion about the
PCI-Bus problem, here is /proc/pci which should also tell about most
of the involved hardware. If the grabber is pulled, this is the
setting that runs stably under 2.0.35: the problematic case appears
when a device is plugged into 32Bit Slots. They appear on PCI Bus 1.
It is sufficient to replug the Graphics card into PCI bus 1 to cause
the system to fail under high load even when the grabber is out, and
the old 2.0.35 kernel is used.

PCI devices found:
Bus 0, device 13, function 0:
PCI bridge: DEC DC21052 (rev 2).
Medium devsel. Fast back-to-back capable. Master Capable. Latency=32. Min Gnt=4.
Bus 0, device 14, function 0:
ISA bridge: Intel 82371SB PIIX3 ISA (rev 1).
Medium devsel. Fast back-to-back capable. Master Capable. No bursts.
Bus 0, device 14, function 1:
IDE interface: Intel 82371SB PIIX3 IDE (rev 0).
Medium devsel. Fast back-to-back capable. Master Capable. Latency=32.
I/O at 0x8800 [0x8801].
Bus 0, device 15, function 0:
Ethernet controller: DEC DC21142 (rev 48).
Medium devsel. Fast back-to-back capable. IRQ 44. Master Capable. Latency=32. Min Gnt=20.Max Lat=40.
I/O at 0x8000 [0x8001].
Non-prefetchable 32 bit memory at 0xb000000 [0xb000000].
Bus 0, device 17, function 0:
VGA compatible controller: Matrox Millennium II (rev 0).
Medium devsel. Fast back-to-back capable. IRQ 43. Master Capable. Latency=32.
Prefetchable 32 bit memory at 0x9000000 [0x9000008].
Non-prefetchable 32 bit memory at 0xa000000 [0xa000000].
Non-prefetchable 32 bit memory at 0xa800000 [0xa800000].
Bus 1, device 9, function 0:
Multimedia video controller: Brooktree Bt848 (rev 18).
Medium devsel. Fast back-to-back capable. IRQ 31. Master Capable. Latency=32. Min Gnt=16.Max Lat=40.
Prefetchable 32 bit memory at 0xb102000 [0xb102008].
Bus 1, device 13, function 0:
SCSI storage controller: NCR 53c875 (rev 3).
Medium devsel. IRQ 20. Master Capable. Latency=32. Min Gnt=17.Max Lat=64.
I/O at 0x9000 [0x9001].
Non-prefetchable 32 bit memory at 0xb100000 [0xb100000].
Non-prefetchable 32 bit memory at 0xb101000 [0xb101000].

The system is an UX4 board from Samsung. Do you have any ideas of
whether this can be a Linux problem (PCI bridge initialization?),
somehow solved by using the setup, or whether this is most probably a
hardware problem to be solved by sending the motherboard to the
vendor?

Thanks,

David Kastrup Phone: +49-234-700-5570
Email: dak@neuroinformatik.ruhr-uni-bochum.de Fax: +49-234-709-4209
Institut für Neuroinformatik, Universitätsstr. 150, 44780 Bochum, Germany

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/