[MM BUG report] 2.4.20-ac2 mm/filemap.c truncate_complete_page BUG()

From: Lionel Bouton (Lionel.Bouton@inet6.fr)
Date: Tue Jan 07 2003 - 09:27:52 EST


This happened 3 times, each time some (1-3) days after bootup

ksymoops output is attached

## Hardware/Kernel Config

The config was just upgraded from the following (where everything worked
nicely for weeks between reboots).

Bi-Celeron 433 on Abit BP6 wit 256 Mb RAM and latest RedHat 7.3 kernel
(at that time 2.4.18-18).

to this one :

Single Athlon XP 1700+ on ECS K7S5A (SiS735 chipset) with 128 Mb RAM and
2.4.20-ac2 kernel.

This was a drop in replacemnt (same disks, same software).

## Modules loaded (yes there are many of them)

Here are the modules loaded on current config :

Module Size Used by Not tainted
smbfs 37664 1 (autoclean)
nfsd 75872 8 (autoclean)
lockd 55936 1 (autoclean) [nfsd]
sunrpc 74804 1 (autoclean) [nfsd lockd]
parport_pc 28904 1 (autoclean)
lp 7552 0 (autoclean)
parport 35104 1 (autoclean) [parport_pc lp]
ppp_deflate 3904 2 (autoclean)
zlib_inflate 21664 0 (autoclean) [ppp_deflate]
zlib_deflate 20864 0 (autoclean) [ppp_deflate]
bsd_comp 5088 0 (autoclean)
it87 8256 0
i2c-proc 8192 0 [it87]
i2c-isa 1892 0 (unused)
i2c-core 20800 0 [it87 i2c-proc i2c-isa]
cls_u32 5572 1 (autoclean)
cls_fw 3072 1 (autoclean)
sch_sfq 4416 2 (autoclean)
sch_cbq 13952 1 (autoclean)
ipt_MARK 1408 18 (autoclean)
iptable_mangle 2816 1 (autoclean)
ipt_TCPMSS 3040 1 (autoclean)
ipt_limit 1536 2 (autoclean)
ipt_REJECT 3584 2 (autoclean)
n_hdlc 7136 1
ppp_synctty 6272 1
ipt_state 1088 19 (autoclean)
ipt_unclean 7552 1 (autoclean)
ipt_LOG 4160 5 (autoclean)
ip_nat_ftp 3776 0 (unused)
ip_conntrack_ftp 4864 1
iptable_nat 18868 2 (autoclean) [ip_nat_ftp]
ip_conntrack 24748 3 (autoclean) [ipt_state ip_nat_ftp
ip_conntrack_ftp iptable_nat]
ppp_async 8032 1
ppp_generic 19244 6 [ppp_deflate bsd_comp ppp_synctty
ppp_async]
iptable_filter 2336 1 (autoclean)
slhc 6412 1 [ppp_generic]
ip_tables 13568 12 [ipt_MARK iptable_mangle ipt_TCPMSS
ipt_limit ipt_REJECT ipt_state ipt_unclean ipt_LOG iptable_nat
iptable_filter]
sis900 14756 1
3c59x 27368 1
af_packet 13800 2 (autoclean)
ide-cd 31776 0 (autoclean)
cdrom 31968 0 (autoclean) [ide-cd]
md 48768 0 (unused)
loop 10128 0 (autoclean)
lvm-mod 49664 8 (autoclean)
usb-ohci 20192 0 (unused)
usbcore 71424 1 [usb-ohci]

Lm_sensors (it87 module here) was already used on previous config and
upgraded to v2.7.0 with i2c-2.7.0.
I can prevent lm_sensors and i2c modules from loading if needed.

The box is under heavy memory pressure with quasi constant swap in swap
out (personnal NAT box with numerous services running nfs, squid, apache
...). Current vmstat output is :

   procs memory swap io
system cpu
 r b w swpd free buff cache si so bi bo in cs us
sy id
 0 0 0 63888 6036 1628 27112 28 36 174 227 37 33
8 4 88

-> si = 28, so = 36

The perf is not stellar but is expected to be so under this kind of
memory pressure (512 MB stick was already ordered when first Oops came).
What's not expected is the oops attached.
The system does not crash but the process involved (always rrdtool
launched by crontab for the 3 Oopps) is stuck in D state and each
following process trying to access the same files are stucked in D
states too (as rrdtool is called each minute to update its databases and
graph nice graphs it leads to load average jumping to 200 in a matter of
3 hours...).

The code involved in filemap.c truncate_complete_page are the very first
lines :

[...]
        /* Page has already been removed from processes, by vmtruncate() */
        if (page->pte_chain)
                BUG();
[...]

For the record (probably not important) last time I checked the rrdtool
processes stucked in D state were the ones that graphed the load average
(so the file that posed problem was probably not the database but one of
the resulting png file).

Question : how do I get the current syscall of a process stucked in D
state remotely (Sysrq can do it but there's no screen attached so if I
could get the info remotely that would spare me temporary screen
relocations...).

I've a stock 2.4.20 kernel already compiled for testing (will be used on
next reboot, lm_sensors and i2c modules won't load).

LB.

ksymoops 2.4.4 on i686 2.4.20-ac2. Options used
     -v vmlinux (specified)
     -k /proc/ksyms (default)
     -l /proc/modules (default)
     -o /lib/modules/2.4.20-ac2/ (default)
     -m /boot/System.map-2.4.20-ac2 (default)

Jan 6 01:13:04 twins kernel: kernel BUG at filemap.c:242!
Jan 6 01:13:04 twins kernel: invalid operand: 0000
Jan 6 01:13:04 twins kernel: CPU: 0
Jan 6 01:13:04 twins kernel: EIP: 0010:[<c01250ac>] Not tainted
Using defaults from ksymoops -t elf32-i386 -a i386
Jan 6 01:13:04 twins kernel: EFLAGS: 00010286
Jan 6 01:13:04 twins kernel: eax: c5563dd8 ebx: c10b2968 ecx: c10b2968 edx: c4b257e8
Jan 6 01:13:04 twins kernel: esi: c54be1a8 edi: 00000000 ebp: 00000000 esp: c5563d78
Jan 6 01:13:04 twins kernel: ds: 0018 es: 0018 ss: 0018
Jan 6 01:13:04 twins kernel: Process rrdtool (pid: 25745, stackpage=c5563000)
Jan 6 01:13:04 twins kernel: Stack: c10b2968 c012523a c10b2968 00000000 00000001 c5563dd8 00000000 40400000
Jan 6 01:13:04 twins kernel: 40016000 c3564404 c0e3a000 c1bc5874 00000000 00000081 c307b300 00000000
Jan 6 01:13:04 twins kernel: c307b35c c01bfefb 40015000 00000000 c5563dd8 00000000 c54be1a8 c01252db
Jan 6 01:13:04 twins kernel: Call Trace: [<c012523a>] [<c01bfefb>] [<c01252db>] [<c0123058>] [<c0147f64>]
Jan 6 01:13:04 twins kernel: [<c015923b>] [<c0126d8b>] [<c0126db9>] [<c01480d4>] [<c0132ffd>] [<c0132d5c>]
Jan 6 01:13:04 twins kernel: [<c01336b6>] [<c013eca4>] [<c0113111>] [<c01bfa0e>] [<c01344b4>] [<c01347e1>]
Jan 6 01:13:04 twins kernel: [<c0106c8b>]
Jan 6 01:13:04 twins kernel: Code: 0f 0b f2 00 be a4 20 c0 8b 43 30 85 c0 74 0e 6a 00 53 e8 ad

>>EIP; c01250ac <truncate_complete_page+c/60> <=====
Trace; c012523a <truncate_list_pages+13a/1a0>
Trace; c01bfefb <kfree_skbmem+b/60>
Trace; c01252db <truncate_inode_pages+3b/70>
Trace; c0123058 <vmtruncate+98/130>
Trace; c0147f64 <inode_setattr+24/d0>
Trace; c015923b <ext3_setattr+13b/180>
Trace; c0126d8b <filemap_nopage+bb/200>
Trace; c0126db9 <filemap_nopage+e9/200>
Trace; c01480d4 <notify_change+54/100>
Trace; c0132ffd <pte_chain_alloc+d/30>
Trace; c0132d5c <page_add_rmap+2c/80>
Trace; c01336b6 <do_truncate+46/60>
Trace; c013eca4 <open_namei+3d4/4f0>
Trace; c0113111 <__wake_up+41/60>
Trace; c01bfa0e <sock_def_write_space+2e/70>
Trace; c01344b4 <filp_open+34/60>
Trace; c01347e1 <sys_open+31/80>
Trace; c0106c8b <system_call+33/38>
Code; c01250ac <truncate_complete_page+c/60>
00000000 <_EIP>:
Code; c01250ac <truncate_complete_page+c/60> <=====
   0: 0f 0b ud2a <=====
Code; c01250ae <truncate_complete_page+e/60>
   2: f2 00 be a4 20 c0 8b repnz add %bh,0x8bc020a4(%esi)
Code; c01250b5 <truncate_complete_page+15/60>
   9: 43 inc %ebx
Code; c01250b6 <truncate_complete_page+16/60>
   a: 30 85 c0 74 0e 6a xor %al,0x6a0e74c0(%ebp)
Code; c01250bc <truncate_complete_page+1c/60>
  10: 00 53 e8 add %dl,0xffffffe8(%ebx)
Code; c01250bf <truncate_complete_page+1f/60>
  13: ad lods %ds:(%esi),%eax

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Tue Jan 07 2003 - 22:00:35 EST