Linux locking up - <Also look at the timebomb messages>

E.J. Wilburn (ej@ns1.woodtech.com)
Sun, 23 Jul 1995 03:56:05 -0500 (CDT)


Recently, in the last few weeks linux has been locking up as frequently
as once every 24-48 hours on 3 seperate systems in our office. The
systems are configured as follows -

User server -
AMD DX4/100
Acer mb w/AMI BIOS
16mb ram
32mb swap
1.2gb EIDE drive.
3com 3c509 network card
Trident 1mb ISA video card.
Promise VLB EIDE controller card
Linux 1.2.11 w/ELF, accounting and quota patches.
Accounting isn't active but quota is.
precompiled bash 1.14.4 ELF from sunsite
400+ users, averages 15 active users.

Terminal Server 1-
AMD DX2/80
Shuttle MB w/ AMI Bios
16mb ram
32mb swap
540mb HD
Generic IDE controller
Generic 1mb video card.
3com 3c509 network card.
Linux 1.2.11 w/ELF
pppd 2.2b3 950717
dip 3.3.7n
using defualt shells <mainly bash> from slackware 2.3
400+ users, averages 20 active users.
Cyclades cyclom 32ye 32 port card, 27 ports being used at 115.2k
running uugetty <latest version>

Terminal Server 2-
AMD DX2/80
Shuttle MB w/AMI Bios
16mb
32mb swap
540mb HD
Generic IDE controller
Generic 1mb video card.
3com 3c509 network card.
Linux 1.2.11 w/ELF
400+ users, averages 8 active.
pppd 2.2b3 950717
dip 3.3.7n
default shells <mainly bash> from slackware 2.3
Cyclades cyclom 16ye 16 port card, 16 ports being used at 57.6k
running uugetty<latest version>

The user server get the most use and the most hammering by users and it's
the most frequent to lock up, Terminal server 1 is the second most used
and the second easiest to lockup and Terminal server 2 rarely locks up.
Incidentally our News server running a 1.8gb scsi HD and a 4GB dat with
20mb ram on a DX2/80 and Linux 1.3.4 ELF never locks up, and it has
intense network and HD activity <since we get a full news feed>. When
the systems lock up we can still switch VT's and ping the system and when
we telnet to the system it opens the connection but doesn't spawn a login
process or even run in.telnetd, and you're unable to type on the VT. It
seems that the kernel itself isn't locked up it just can't spawn new
processes. After about 10-15 lockups I finally got a kernel panic
message, here it is -

General Protection: 0000
EIP: 0010:00753830
EFLAGS: 00010202
eax: 00ad4c04 ebx: 003e6a08 ecx: 003e6adc edx: 00753830
esi: fffffffc edi: 001b6418 ebp: 001977a0 esp: 00197794
ds: 0018 es: 0018 fs: 002b gs:0018 ss:0018
Corrupted stack page.
Process swapper (pid:0, process nr:0, stackpage=00195854)
Stack: 0011569a 00ad4c04 00000004 00000014 0011af3e 00000000 001977c4
0009e000 00101ffc 00001000 001102cd 00000006 0019788c 0019788c 0009e000
00101ffc 00001000 0019788c 00110018 00190018 0000002b 00190018 fffffffe
Call Trace: 0011569a 0011af3e 001102cd 00110018 0010f564 00110349 0010f113
00114ed9
Code: 12 0e e1 39 5d 05 3a 06 d6 c6 67 1e 21 93 f2 18 90 83 9a 09
Aieee, killing interrupt handler
kfree of non-kmalloced memory: 0019784c, next=003e6a10, order=1572833
task [0] (swapper) killed: unable to recover
kernel panic: Trying to free up swapper mem space.
In swapper task - not syncing

Here's some of the call trace info -

0011569a - 00115670 T tqueue_bh
001156c0 T immediate_bh
0011af3e - 0011af00 T do_bottom_half
0011af80 T get_ioport_list
0011aff0 t find_gap
001102cd - 00110270 T lcall7
001102c0 t handle_bottom_half
001102e0 t reschedule
001102f0 T system_call
00110018 - 0010fed0 T do_siganl
0010f564 - 0010f500 T sys_idle
0010f570 T hard_reset_now
0010f5c0 T show_regs
00110349 - 001102f0 T system_call
00110390 T ret_from_sys_call
0010f113 - 0010efa0 T start_kernel
0010f120 t printf
0010f160 T init
00114ed9 - 00114ef0 T schedule
00114f20 T sys_pause

Hope this helps get some results.

-E.J. Wilburn
System Administrator - Woodtech Information Systems, Inc.
ej@woodtech.com