Re[2]: PROBLEM: page allocation or what in 2.6.8.1
From: Harry Edmon
Date: Tue Aug 31 2004 - 11:40:42 EST
We believe the hardware is okay. We have run numerous memtests, all okay. This
is a Tyan S2721-533 with dual 3.06 Xeons.
We tried taking out CONFIG_DEBUG_PAGEALLOC and still we get crashes, especially
in nfsd. We have now gone to the 2.6.8.1-mm4 kernel and got the following
crash:
kfree_debugcheck: bad ptr f8c189fch.
------------[ cut here ]------------
kernel BUG at mm/slab.c:1833!
invalid operand: 0000 [#1]
PREEMPT SMP
Modules linked in: nfs autofs4 nfsd exportfs lockd sunrpc capability commoncap i
pv6 eepro100 joydev tsdev usbhid uhci_hcd usbcore evdev e100 mii sd_mod 3w_xxxx
scsi_mod ide_cd cdrom rtc unix
CPU: 0
EIP: 0060:[<c0143d0c>] Not tainted VLI
EFLAGS: 00010086 (2.6.8.1-debug-mm)
EIP is at kfree_debugcheck+0x4c/0x70
eax: 00000028 ebx: c1718300 ecx: c03729bc edx: c03729bc
esi: f8c189fc edi: f8c189fc ebp: f2efbeac esp: f2efbe9c
ds: 007b es: 007b ss: 0068
Process rpc.mountd (pid: 2529, threadinfo=f2efa000 task=f2ede0b0)
Stack: c0339fa0 f8c189fc f8c189fc f7d5124c f2efbed0 c0144cb5 f8c189fc f2efbeb4
f7d51bcc 00000286 f7d5124c f7d5124c efdd925d f2efbee0 f8bc7a48 f8c189fc
f7d51bcc f2efbf0c f8bc80d9 f7d5124c f8bddfc0 00000000 f2efa000 f8bdef28
Call Trace:
[<c0106f69>] show_stack+0x80/0x96
[<c0107100>] show_registers+0x15f/0x1c3
[<c010730a>] die+0x10d/0x1a8
[<c010782b>] do_invalid_op+0x104/0x106
[<c0106b81>] error_code+0x2d/0x38
[<c0144cb5>] kfree+0x25/0x9f
[<f8bc7a48>] ip_map_put+0x45/0x6e [sunrpc]
[<f8bc80d9>] ip_map_lookup+0x2ce/0x3a9 [sunrpc]
[<f8bc8215>] auth_unix_add_addr+0x61/0x9f [sunrpc]
[<f8c02eb9>] exp_addclient+0xb2/0xbc [nfsd]
[<f8bfa9ca>] nfsctl_transaction_write+0x6e/0x98 [nfsd]
[<c01869cb>] sys_nfsservctl+0xc0/0x115
[<c01060a5>] sysenter_past_esp+0x52/0x71
Code: e3 05 03 1d 90 95 48 c0 8b 03 a9 80 00 00 00 74 0a 8b 5d f8 8b 75 fc 89 ec
5d c3 89 74 24 04 c7 04 24 a0 9f 33 c0 e8 f0 a6 fd ff <0f> 0b 29 07 d6 92 33 c0
eb dc 89 74 24 04 c7 04 24 e0 9f 33 c0
Andrew Morton <akpm@xxxxxxxx> wrote:
> Harry Edmon <harry@xxxxxxxxxxxxxxxxxxxx> wrote:
> >
> > I have had another crash on the same system as my message of 23 August:
> >
> > Unable to handle kernel paging request at virtual address cf0b4e1c
>
> hm. Is the hardware known to be good?
>
> > ...
> >
> > Before the crash I see messages like the following:
> >
> > oom-killer: gfp_mask=0xd0
>
> That's because you've enabled. It enormously
> increases the size of slab objects, which seems to cause memory reclaim to
> blow up. (It shouldn't but it does. It's a low-priority problem though).
>
> It's unlikely that the oom-killing caused the oops, but it's possible I
> guess. There's supposed to be a dump_stack() in the out_of_memory() path,
> which would help in searching for bugs, but that seems to have got lost.
--
Dr. Harry Edmon E-MAIL: harry@xxxxxxxxxxxxxxxxxxxx
206-543-0547 harry@xxxxxxxxxxxxxxxx
Dept of Atmospheric Sciences FAX: 206-543-0308
University of Washington, Box 351640, Seattle, WA 98195-1640
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/