Repeated hard crashes of 2.2.* system related to tcp_keepalive

From: Whit Blauvelt (whit@transpect.com)
Date: Thu Feb 24 2000 - 23:35:37 EST

Next message: Aaron Denney: "Obsolete variables in Makefiles?"
Previous message: Andre Hedrick: "Re: 2.3.47 tulip lockup"
Next in thread: David S. Miller: "Re: Repeated hard crashes of 2.2.* system related to tcp_keepalive"
Reply: David S. Miller: "Re: Repeated hard crashes of 2.2.* system related to tcp_keepalive"
Reply: Andre Hedrick: "Re: Repeated hard crashes of 2.2.* system related to tcp_keepalive"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

[1.] One line summary of the problem:

Hard crashes of system related to tcp_keepalive

[2.] Full description of the problem/report:

The following frozen screen occurring between once a week and once a
day (with only the most minor differences, has appeared running kernels
1.2.13, 1.2.14 and now 1.2.15pre9, through replacement of the memory,
motherboard, CPU, brand of NICs, video card, and drive cables (what's
constant now is the Tekram DC-390F SCSI controller, the IBM HD, a
Western Digital HD used for /boot and swap, and the case and power supply):

[3.] Keywords (i.e., modules, networking, kernel):

tcp_keepalive crash

[4.] Kernel version (from /proc/version):

Linux version 2.2.15pre9 (root@china.patternbook.com) (gcc version
egcs-2.91.66 19990314/Linux (egcs-1.1.2 release)) #2 Mon Feb 21 00:01 :18
EST 2000 (also 2.2.13 and 2.2.14 had identical problem)

[5.] Output of Oops.. message (if applicable) with symbolic information
resolved (see Documentation/oops-tracing.txt)

Unable to handle kernel NULL pointer dereference at virtual address 00000050
Oops: 0000
CPU: 0
EIP: 0010 [<c0177d5c>]
EFLAGS: 000010296
eax: 00000000 ebx: c2f83380 ecx: c2f835e8 edx: 000000c0
esi: 00000001 edi: 00000000 ebp: 00001585 esp: c021fef8
ds: 018 es: 0018 ss: 0018
Stack: 00000007 00000000 c02124b0 00000001 00000002 00b859ca 00000000 c01780e6
        c02124b0 00000000 c01780b4 000000ca c021ff4c c0111271 00000000 00000001
        c025c640 00b859bc 00000001 00000000 c021e000 c021ff60 c0117991 00000000
Call Trace: [<c01780e6>] [<c01780b4>] [<c0111271>] [<c0117991>] [<c010ald7>] [<c0109e38>] [<c0107859>]
                [<c0106000>] [<c010787c>] [<c0109000>] [<c0106000>] [<c010607b>] [<c0106000>] [<c0100176>]
Code: 8b 40 50 ff d0 83 c4 10 83 fe 10 75 06 ff 0d 40 e5 25 c0 66

>>EIP: c0177d5c <tcp_keepalive+e0/180>
Trace: c01780e6 <tcp_sltimer_handler+32/70>
Trace: c01780b4 <tcp_sltimer_handler+0/70>
Trace: c0111271 <timer_bh+331/388>
Trace: c0117991 <do_bottom_half+45/64>
Trace: c0106000 <get_options+0/74>
Code: c0177d5c <tcp_keepalive+e0/180> 00000000 <_EIP>: <===
Code: c0177d5c <tcp_keepalive+e0/180> 0: 8b 40 50 mov 0x50(%eax),%eax <===
Code: c0177d5f <tcp_keepalive+e3/180> 3: ff d0 call *%eax
Code: c0177d61 <tcp_keepalive+e5/180> 5: 83 c4 10 add $0x10,%esp
Code: c0177d64 <tcp_keepalive+e8/180> 8: 83 fe 10 cmp $0x10,%esi
Code: c0177d67 <tcp_keepalive+eb/180> b: 75 06 jne c0177d6f <tcp_keepalive+f3/180>
Code: c0177d69 <tcp_keepalive+ed/180> d: ff 0d 40 e5 25 c0 decl 0xc025e540
Code: c0177d6f <tcp_keepalive+f3/180> 13: 66 data16

Aiee, killing interrupt handler
Kernel panic: Attemted to kill the idle task:
In swapper task - not syncing

[6.] A small shell script or example program which triggers the
problem (if possible)

Not a clue. Happens at different times of day, in different load
conditions.

[7.] Environment
[7.1.] Software (add the output of the ver_linux script here)

Kernel modules 2.3.9
Gnu C egcs-2.91.66
Binutils 2.9.4.0.6
Linux C Library 2.1.1
Dynamic linker ldd (GNU libc) 2.1.1
Procps 2.0.2
Mount 2.9o
Net-tools 1.52
Console-tools 1999.03.02
Sh-utils 1.16
Modules Loaded 3c59x

[7.2.] Processor information (from /proc/cpuinfo):

processor : 0
vendor_id : AuthenticAMD
cpu family : 5
model : 8
model name : AMD-K6(tm) 3D processor
stepping : 12
cpu MHz : 367.512290 (it's a 450 but I'm clocking it slow)
cache size : 64 KB
fdiv_bug : no
hlt_bug : no
sep_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 1
wp : yes
flags : fpu vme de pse tsc msr mce cx8 sep mtrr pge mmx 3dnow
bogomips : 732.36

Note this also was happenning with an AMD 233

[7.3.] Module information (from /proc/modules):

3c59x 19332 2 (autoclean)

Note this was also happenning with 2 LinkSys (tulip) NICs rather than the 2
3Com now in use

[7.4.] SCSI information (from /proc/scsi/scsi)

General information:
  Chip sym53c875E, device id 0xf, revision id 0x26
  IO port address 0xc000, IRQ number 5
  Using memory mapped IO at virtual address 0xc4000000
  Synchronous period factor 12, max commands per lun 32

[7.5.] Other information that might be relevant to the problem
(please look in /proc and include all information that you
think to be relevant):

No idea what would be relevant in proc. The system is a DSL connected
firewall with masquerading for two other systems (although it fails
whether they are active or off), running DNS (bind 8.2.2 5), Apache,
Proftpd, sendmail.

[X.] Other notes, patches, fixes, workarounds:

Nope. All suggestions welcome. I'm ready to climb a tree and not come
down again.

Whit Blauvelt
whit@transpect.com

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/

Next message: Aaron Denney: "Obsolete variables in Makefiles?"
Previous message: Andre Hedrick: "Re: 2.3.47 tulip lockup"
Next in thread: David S. Miller: "Re: Repeated hard crashes of 2.2.* system related to tcp_keepalive"
Reply: David S. Miller: "Re: Repeated hard crashes of 2.2.* system related to tcp_keepalive"
Reply: Andre Hedrick: "Re: Repeated hard crashes of 2.2.* system related to tcp_keepalive"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

This archive was generated by hypermail 2b29 : Tue Feb 29 2000 - 21:00:11 EST