oops on P4 i875 w/ 2 gig RAM

From: Magnus Stenman
Date: Mon Nov 24 2003 - 06:43:20 EST


Hi!

I realize it's 2.6 time but this problem is driving me nuts. I've searched LKML, RH bugzilla, Google, etc and cannot find a resolution.


Machine has a Gigabyte GA-8IK1100 mobo (intel i875 chipset) with an Adaptec AHA-2940UW Pro PCI card, 3 SCSI drives, SDT-9000 DAT drive
Intel P4 2.60GHz with (and without) HT

2 gig DDR400 non-ecc dual channel (mobo manufacturer certified) samsung memory

Properly cooled case

No special kernel options
Redhat 7.3 with latest errata

Module Size Used by Not tainted
appletalk 28396 11 (autoclean)
3c59x 29192 1
st 30416 0
ext3 69600 4
jbd 51816 4 [ext3]
aic7xxx 134784 5
sd_mod 12828 10
scsi_mod 110876 3 [st aic7xxx sd_mod]

some oddities in dmesg; unexpected IO-APIC, detects some XEON? stuff


Symptoms:

The machine crashes after days or hours when using 2 gig memory.
When using 512Mb it runs fine for weeks.



I cannot come up with anything that triggers the crashes. Mem/disk/CPU load does not trigger it.

Tried both SMP and UP, (turned on and off HT in BIOS, booted SMP/UP kernel)

Replaced memory (different manufacturer, same specs)

Replaced motherboard (from ASUS->Gigabyte, roughly same specs)

Passes Memtest86



Anyone got any suggestions?


/magnus


attached two of the oopses (a couple more are attached compressed, including some with kernel BUGs in them)

ksymoops 2.4.4 on i686 2.4.20-20.7smp. Options used
-V (default)
-k /proc/ksyms (specified)
-l /proc/modules (default)
-o /lib/modules/2.4.20-20.7smp/ (default)
-m /boot/System.map-2.4.20-20.7smp (default)
-i

Nov 20 01:45:47 lakrits kernel: Unable to handle kernel paging request at virtual address 8502a858
Nov 20 01:45:47 lakrits kernel: c015d0db
Nov 20 01:45:47 lakrits kernel: *pde = 00000000
Nov 20 01:45:47 lakrits kernel: Oops: 0000
Nov 20 01:45:47 lakrits kernel: CPU: 1
Nov 20 01:45:47 lakrits kernel: EIP: 0010:[<c015d0db>] Not tainted
Using defaults from ksymoops -t elf32-i386 -a i386
Nov 20 01:45:47 lakrits kernel: EFLAGS: 00010282
Nov 20 01:45:47 lakrits kernel: eax: 8502a840 ebx: f0e29900 ecx: f4e29910 edx: ce84c930
Nov 20 01:45:47 lakrits kernel: esi: d414cf7c edi: 8502a840 ebp: 0000135e esp: c44c1f3c
Nov 20 01:45:47 lakrits kernel: ds: 0018 es: 0018 ss: 0018
Nov 20 01:45:47 lakrits kernel: Process kswapd (pid: 5, stackpage=c44c1000)
Nov 20 01:45:47 lakrits kernel: Stack: f7fb0000 ea2f6760 ea2f6760 f1bce580 ce84c918 ce84c900 f0e29900 c015a84b
Nov 20 01:45:47 lakrits kernel: f0e29900 c01485ac ea2f6760 c014ab7a 00000246 c44c1f84 00c4c5f5 c0118e94
Nov 20 01:45:47 lakrits kernel: c44c1f84 c44c1f84 00000000 00000000 00c4c5f5 00000100 c030c4a0 000001d0
Nov 20 01:45:47 lakrits kernel: Call Trace: [<c015a84b>] prune_dcache [kernel] 0xeb (0xc44c1f58))
Nov 20 01:45:47 lakrits kernel: [<c01485ac>] balance_dirty_state [kernel] 0xc (0xc44c1f60))
Nov 20 01:45:47 lakrits kernel: [<c014ab7a>] try_to_free_buffers [kernel] 0x13a (0xc44c1f68))
Nov 20 01:45:47 lakrits kernel: [<c0118e94>] schedule_timeout [kernel] 0x84 (0xc44c1f78))
Nov 20 01:45:47 lakrits kernel: [<c015ac40>] shrink_dcache_memory [kernel] 0x20 (0xc44c1fa0))
Nov 20 01:45:47 lakrits kernel: [<c013c513>] do_try_to_free_pages_kswapd [kernel] 0x13 (0xc44c1fa8))
Nov 20 01:45:47 lakrits kernel: [<c013c9e1>] kswapd [kernel] 0x141 (0xc44c1fd4))
Nov 20 01:45:47 lakrits kernel: [<c0105000>] stext [kernel] 0x0 (0xc44c1fe8))
Nov 20 01:45:47 lakrits kernel: [<c0107266>] arch_kernel_thread [kernel] 0x26 (0xc44c1ff0))
Nov 20 01:45:47 lakrits kernel: [<c013c8a0>] kswapd [kernel] 0x0 (0xc44c1ff8))
Nov 20 01:45:47 lakrits kernel: Code: 8b 47 18 85 c0 74 04 53 ff d0 58 68 dc 0a 31 c0 8d 43 2c 50

>>EIP; c015d0db <iput+3b/280> <=====
Trace; c015a84b <prune_dcache+eb/190>
Trace; c01485ac <balance_dirty_state+c/60>
Trace; c014ab7a <try_to_free_buffers+13a/150>
Trace; c0118e94 <schedule_timeout+84/a0>
Trace; c015ac40 <shrink_dcache_memory+20/30>
Trace; c013c513 <do_try_to_free_pages_kswapd+13/310>
Trace; c013c9e1 <kswapd+141/4e0>
Trace; c0105000 <_stext+0/0>
Trace; c0107266 <arch_kernel_thread+26/30>
Trace; c013c8a0 <kswapd+0/4e0>
Code; c015d0db <iput+3b/280>
00000000 <_EIP>:
Code; c015d0db <iput+3b/280> <=====
0: 8b 47 18 mov 0x18(%edi),%eax <=====
Code; c015d0de <iput+3e/280>
3: 85 c0 test %eax,%eax
Code; c015d0e0 <iput+40/280>
5: 74 04 je b <_EIP+0xb> c015d0e6 <iput+46/280>
Code; c015d0e2 <iput+42/280>
7: 53 push %ebx
Code; c015d0e3 <iput+43/280>
8: ff d0 call *%eax
Code; c015d0e5 <iput+45/280>
a: 58 pop %eax
Code; c015d0e6 <iput+46/280>
b: 68 dc 0a 31 c0 push $0xc0310adc
Code; c015d0eb <iput+4b/280>
10: 8d 43 2c lea 0x2c(%ebx),%eax
Code; c015d0ee <iput+4e/280>
13: 50 push %eax

ksymoops 2.4.4 on i686 2.4.20-20.7smp. Options used
-V (default)
-k /proc/ksyms (specified)
-l /proc/modules (default)
-o /lib/modules/2.4.20-20.7smp/ (default)
-m /boot/System.map-2.4.20-20.7smp (default)
-i

Oct 22 02:19:26 lakrits kernel: Unable to handle kernel paging request at virtual address 00cd14a4
Oct 22 02:19:26 lakrits kernel: c01432a7
Oct 22 02:19:26 lakrits kernel: *pde = 00000000
Oct 22 02:19:26 lakrits kernel: Oops: 0000
Oct 22 02:19:26 lakrits kernel: CPU: 1
Oct 22 02:19:26 lakrits kernel: EIP: 0010:[<c01432a7>] Not tainted
Using defaults from ksymoops -t elf32-i386 -a i386
Oct 22 02:19:26 lakrits kernel: EFLAGS: 00010206
Oct 22 02:19:26 lakrits kernel: eax: 00000000 ebx: c44c0000 ecx: f66a1880 edx: c102a9d0
Oct 22 02:19:26 lakrits kernel: esi: c102a9d0 edi: 00cd1428 ebp: 000001f4 esp: c44c1f74
Oct 22 02:19:26 lakrits kernel: ds: 0018 es: 0018 ss: 0018
Oct 22 02:19:26 lakrits kernel: Process kscand (pid: 6, stackpage=c44c1000)
Oct 22 02:19:26 lakrits kernel: Stack: 0000001e 00000000 00000000 00000000 c44c1fac c102a9d0 c44c0000 c102a9d0
Oct 22 02:19:26 lakrits kernel: 00000000 000001f4 c013b3f1 c44c1fac c102a9b4 c030d558 00000001 c44c0000
Oct 22 02:19:26 lakrits kernel: c030c4a0 00000000 000001f4 c013d339 c030c4a0 00000000 00000001 c44c0000
Oct 22 02:19:26 lakrits kernel: Call Trace: [<c013b3f1>] scan_active_list [kernel] 0xe1 (0xc44c1f9c))
Oct 22 02:19:26 lakrits kernel: [<c013d339>] kscand [kernel] 0xc9 (0xc44c1fc0))
Oct 22 02:19:26 lakrits kernel: [<c0105000>] stext [kernel] 0x0 (0xc44c1fe8))
Oct 22 02:19:26 lakrits kernel: [<c0107266>] arch_kernel_thread [kernel] 0x26 (0xc44c1ff0))
Oct 22 02:19:26 lakrits kernel: [<c013d270>] kscand [kernel] 0x0 (0xc44c1ff8))
Oct 22 02:19:26 lakrits kernel: Code: 8b 47 7c 85 c0 0f 84 38 01 00 00 8b 35 b0 84 3b c0 90 8d b4

>>EIP; c01432a7 <page_referenced+177/320> <=====
Trace; c013b3f1 <scan_active_list+e1/6d0>
Trace; c013d339 <kscand+c9/1ce>
Trace; c0105000 <_stext+0/0>
Trace; c0107266 <arch_kernel_thread+26/30>
Trace; c013d270 <kscand+0/1ce>
Code; c01432a7 <page_referenced+177/320>
00000000 <_EIP>:
Code; c01432a7 <page_referenced+177/320> <=====
0: 8b 47 7c mov 0x7c(%edi),%eax <=====
Code; c01432aa <page_referenced+17a/320>
3: 85 c0 test %eax,%eax
Code; c01432ac <page_referenced+17c/320>
5: 0f 84 38 01 00 00 je 143 <_EIP+0x143> c01433ea <page_referenced+2ba/320>
Code; c01432b2 <page_referenced+182/320>
b: 8b 35 b0 84 3b c0 mov 0xc03b84b0,%esi
Code; c01432b8 <page_referenced+188/320>
11: 90 nop
Code; c01432b9 <page_referenced+189/320>
12: 8d b4 00 00 00 00 00 lea 0x0(%eax,%eax,1),%esi

Attachment: oops.resolved.gz
Description: Unix tar archive

Attachment: dmesg.gz
Description: Unix tar archive