PROBLEM: kernel 2.2.10 oopses every couple of days

Filip J. Bujanic (fbujanic@fikus.com)
Mon, 5 Jul 1999 13:55:41 -0400 (EDT)


[1.] One line summary of the problem:
Kernel oopses every couple of days, most of the time box needs cold rebooted.

[2.] Full description of the problem/report:
This machine is a news server. It runs INN 2.2 21-Jan-1999. The machine
is unusable after the oops. Last oops I was able to capture. INND died
but the box was functional.

[3.] Keywords (i.e., modules, networking, kernel):

[4.] Kernel version (from /proc/version):
Linux version 2.2.10 (root@monster) (gcc version 2.7.2.3) #14 Mon Jun 14
22:54:45 EDT 1999

[5.] Output of Oops.. message (if applicable) with symbolic information
resolved (see Documentation/oops-tracing.txt)
ksymoops 0.7c on i586 2.2.10. Options used
-v /usr/local/src/system/linux/vmlinux (specified)
-k /proc/ksyms (default)
-L (specified)
-o /lib/modules/2.2.10/ (default)
-m /usr/local/src/system/linux/System.map (specified)

No modules in ksyms, skipping objects
Unable to handle kernel paging request at virtual address 2e2d4760
current->tss.cr3 = 00c05000, %cr3 = 00c05000
*pde = 00000000
Oops: 0000
CPU: 0
EIP: 0010:[<c0123bfe>]
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010206
eax: 2e2d4760 ebx: 000b76cb ecx: 00000821 edx: 2e2d4760
esi: 00000400 edi: 000b76cb ebp: c4200821 esp: c0c07e30
ds: 0018 es: 0018 ss: 0018
Process innd (pid: 261, process nr: 21, stackpage=c0c07000)
Stack: c0123c2d 00000821 000b76cb 00000400 c0123e93 00000821 000b76cb 00000400
000b76cb 00000000 00000001 c4208dd0 00000008 c0138c98 00000821 000b76cb
00000400 00000008 00000432 00000001 c4208dd0 00000002 cc0454c8 c0139031
Call Trace: [<c0123c2d>] [<c0123e93>] [<c0138c98>] [<c0139031>] [<c0137306>] [<c0137118>] [<c0122a76>]
[<c0137118>] [<c01225b1>] [<c01228a0>] [<c0107a84>]
Code: 8b 12 39 58 04 75 f3 39 70 08 75 ee 66 39 48 0c 75 e8 89 c2

>>EIP; c0123bfe <find_buffer+2a/44> <=====
Trace; c0123c2d <get_hash_table+15/20>
Trace; c0123e93 <getblk+1f/14c>
Trace; c0138c98 <block_getblk+a4/2bc>
Trace; c0139031 <ext2_getblk+181/22c>
Trace; c0137306 <ext2_file_write+1ee/580>
Trace; c0137118 <ext2_file_write+0/580>
Trace; c0122a76 <do_readv_writev+1ae/1f0>
Trace; c0137118 <ext2_file_write+0/580>
Trace; c01225b1 <sys_lseek+95/b8>
Trace; c01228a0 <sys_write+c4/ec>
Trace; c0107a84 <system_call+34/40>
Code; c0123bfe <find_buffer+2a/44>
00000000 <_EIP>:
Code; c0123bfe <find_buffer+2a/44> <=====
0: 8b 12 movl (%edx),%edx <=====
Code; c0123c00 <find_buffer+2c/44>
2: 39 58 04 cmpl %ebx,0x4(%eax)
Code; c0123c03 <find_buffer+2f/44>
5: 75 f3 jne fffffffa <_EIP+0xfffffffa> c0123bf8 <find_buffer+24/44>
Code; c0123c05 <find_buffer+31/44>
7: 39 70 08 cmpl %esi,0x8(%eax)
Code; c0123c08 <find_buffer+34/44>
a: 75 ee jne fffffffa <_EIP+0xfffffffa> c0123bf8 <find_buffer+24/44>
Code; c0123c0a <find_buffer+36/44>
c: 66 39 48 0c cmpw %cx,0xc(%eax)
Code; c0123c0e <find_buffer+3a/44>
10: 75 e8 jne fffffffa <_EIP+0xfffffffa> c0123bf8 <find_buffer+24/44>
Code; c0123c10 <find_buffer+3c/44>
12: 89 c2 movl %eax,%edx

[6.] A small shell script or example program which triggers the
problem (if possible)

[7.] Environment
This machine handles moderatly buisy news stream and it handels couple of
hundred news readers. It is multihomed on switched 100Mb lan.

[7.1.] Software (add the output of the ver_linux script here)
Linux monster 2.2.10 #14 Mon Jun 14 22:54:45 EDT 1999 i586 unknown
Kernel modules 2.1.121
Gnu C 2.7.2.3
Binutils 2.8.1.0.23
Linux C Library 5.4.46
Dynamic linker ldd: version 1.9.9
Linux C++ Library 2.8.
Procps 1.2.7
Mount 2.7l
Net-tools 1.52
Kbd 0.94
Sh-utils 1.16
Modules Loaded

INN 2.2 21-Jan-1999

[7.2.] Processor information (from /proc/cpuinfo):
processor : 0
vendor_id : GenuineIntel
cpu family : 5
model : 4
model name : Pentium MMX
stepping : 3
cpu MHz : 233.863145
fdiv_bug : no
hlt_bug : no
sep_bug : no
f00f_bug : yes
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 1
wp : yes
flags : fpu vme de pse tsc msr mce cx8 mmx
bogomips : 466.94

[7.3.] Module information (from /proc/modules):
no modules

[7.4.] SCSI information (from /proc/scsi/scsi)
Attached devices:
Host: scsi0 Channel: 00 Id: 00 Lun: 00
Vendor: SEAGATE Model: ST32151N Rev: 0530
Type: Direct-Access ANSI SCSI revision: 02
Host: scsi0 Channel: 00 Id: 03 Lun: 00
Vendor: SEAGATE Model: ST410800N Rev: 0025
Type: Direct-Access ANSI SCSI revision: 02
Host: scsi0 Channel: 00 Id: 04 Lun: 00
Vendor: SEAGATE Model: ST15150N Rev: 0024
Type: Direct-Access ANSI SCSI revision: 02

[7.5.] Other information that might be relevant to the problem
(please look in /proc and include all information that you
think to be relevant):
This machine has NCR53c810 SCSI controler..

[X.] Other notes, patches, fixes, workarounds:
Here is the output of the dmesg:
Linux version 2.2.10 (root@monster) (gcc version 2.7.2.3) #14 Mon Jun 14 22:54:45 EDT 1999
Detected 233863145 Hz processor.
Console: colour VGA+ 80x25
Calibrating delay loop... 466.94 BogoMIPS
Memory: 257824k/262144k available (976k kernel code, 408k reserved, 2904k data, 32k init)
CPU: Intel Pentium MMX stepping 03
Checking 386/387 coupling... OK, FPU using exception 16 error reporting.
Checking 'hlt' instruction... OK.
Intel Pentium with F0 0F bug - workaround enabled.
POSIX conformance testing by UNIFIX
PCI: PCI BIOS revision 2.10 entry at 0xfb2e0
PCI: Using configuration type 1
PCI: Probing PCI hardware
Linux NET4.0 for Linux 2.2
Based upon Swansea University Computer Society NET3.039
NET4: Unix domain sockets 1.0 for Linux NET4.0.
NET4: Linux TCP/IP 1.0 for NET4.0
IP Protocols: ICMP, UDP, TCP
Starting kswapd v 1.5
Serial driver version 4.27 with no serial options enabled
ttyS00 at 0x03f8 (irq = 4) is a 16550A
ttyS01 at 0x02f8 (irq = 3) is a 16550A
Floppy drive(s): fd0 is 1.44M
FDC 0 is a post-1991 82077
scsi-ncr53c7,8xx : at PCI bus 0, device 9, function 0
scsi-ncr53c7,8xx : NCR53c810 at memory 0xe6102000, io 0xe000, irq 11
scsi0 : burst length 8
scsi0 : NCR code relocated to 0x90610 (virt 0xc0090610)
scsi0 : test 1 started
scsi0 : NCR53c{7,8}xx (rel 17)
scsi : 1 host.
scsi0 : target 0 accepting asynchronous SCSI
scsi0 : setting target 0 to asynchronous SCSI
Vendor: SEAGATE Model: ST32151N Rev: 0530
Type: Direct-Access ANSI SCSI revision: 02
Detected scsi disk sda at scsi0, channel 0, id 0, lun 0
scsi0 : target 3 accepting asynchronous SCSI
scsi0 : setting target 3 to asynchronous SCSI
Vendor: SEAGATE Model: ST410800N Rev: 0025
Type: Direct-Access ANSI SCSI revision: 02
Detected scsi disk sdb at scsi0, channel 0, id 3, lun 0
scsi0 : target 4 accepting asynchronous SCSI
scsi0 : setting target 4 to asynchronous SCSI
Vendor: SEAGATE Model: ST15150N Rev: 0024
Type: Direct-Access ANSI SCSI revision: 02
Detected scsi disk sdc at scsi0, channel 0, id 4, lun 0
scsi : detected 3 SCSI disks total.
SCSI device sda: hdwr sector= 512 bytes. Sectors= 4197405 [2049 MB] [2.0 GB]
SCSI device sdb: hdwr sector= 512 bytes. Sectors= 17755614 [8669 MB] [8.7 GB]
SCSI device sdc: hdwr sector= 512 bytes. Sectors= 8388315 [4095 MB] [4.1 GB]
eth0: Intel EtherExpress Pro 10/100 at 0xe800, 00:A0:C9:65:2C:65, IRQ 12.
Board assembly 667280-003, Physical connectors present: RJ45
Primary interface chip i82555 PHY #1.
General self-test: passed.
Serial sub-system self-test: passed.
Internal registers self-test: passed.
ROM checksum self-test: passed (0x49caa8d6).
Receiver lock-up workaround activated.
tulip.c:v0.89H 5/23/98 becker@cesdis.gsfc.nasa.gov
eth1: Lite-On 82c168 PNIC at 0xe400, 00 a0 cc 3a 1e c2, IRQ 10.
eth1: MII transceiver found at MDIO address 1, config 1000 status 782d.
eth1: Advertising 01e1 on PHY 1, previously advertising 01e1.
Partition check:
sda: sda1 sda2 sda3 sda4
sdb: sdb1 sdb2
sdc: sdc1 sdc2
VFS: Mounted root (ext2 filesystem) readonly.
Freeing unused kernel memory: 32k freed
Adding Swap: 130748k swap-space (priority -1)
eth1: Changing PNIC configuration to half-duplex, CSR6 812e0000.

thanks
fil

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/