tmpfs/ramfs with no swap oopses on Au1550/mips(el)

From: elmar gerdes
Date: Thu May 24 2007 - 18:42:55 EST



Hello,


I am experiencing oopses when using tmpfs heavily on an embedded

Au1550-based board (MIPS 32-bit core, little-endian, 128MB RAM).

Everything I need seemed to work (PCI, USB, JFFS2, busybox, tools like
mtools, WLAN, network, ...). The system currently uses busybox-1.00 and
glibc-2.3.2. Everything has been compiled using mipsel-linux-gcc 3.3.3
created from Dan Kegel's crosstool-0.38.


But when I use tmpfs heavily using mcopy or appending to a file using

cat filename >> /tmp/xxx

I either get "No space left on device" (kernel 2.6.21-rc7) which is what
I expect, or I get a sequence of oopses (2.6.14, 2.6.18.2, 2.6.19.2).
When getting "No space left on device", inserting a USB-stick might then
cause an oops ("Bad page state in process 'hotplug'").


I have seen various types of oopses, but they all(?)/usually start with
one like these:

CPU 0 Unable to handle kernel paging request at virtual address\
5da7bee4, epc == 801795c0, ra == 802b61c8

CPU 0 Unable to handle kernel paging request at virtual address\
00000000, epc == 8015b728, ra == 8015ba20

CPU 0 Unable to handle kernel paging request at virtual address\
000000c4, epc == 80160b58, ra == 801604c4

CPU 0 Unable to handle kernel paging request at virtual address\
000000c4, epc == 80160b58, ra == 801604c4

CPU 0 Unable to handle kernel paging request at virtual address\
000000d4, epc == 8015e724, ra == 8015dee8

CPU 0 Unable to handle kernel paging request at virtual address\
00000104, epc == 801748bc, ra == 80174844

CPU 0 Unable to handle kernel paging request at virtual address\
00000000, epc == 80159858, ra == 80159b70

CPU 0 Unable to handle kernel paging request at virtual address\
24ba4d88, epc == 801ace7c, ra == 801b1e84

CPU 0 Unable to handle kernel paging request at virtual address\
00000128, epc == 80155628, ra == 8015575c



I tested this using kernel 2.6.19.2, 2.6.18.2, 2.6.14, and 2.6.21-rc7
like this:



# uname -a
Linux mybox 2.6.18.2 #1 Tue May 22 16:58:12 CEST 2007 mips unknown


df(1) under 2.6.18.2 does not show /var, being 40M of tmpfs as 2.6.19.2
would tell us. /tmp is a symlink to /var/tmp.


Filling tmpfs:

# cat /lib/libc.so.6 >> /tmp/abc
# cat /lib/libc.so.6 >> /tmp/abc
# cat /lib/libc.so.6 >> /tmp/abc
# cat /lib/libc.so.6 >> /tmp/abc
# cat /lib/libc.so.6 >> /tmp/abc
# cat /lib/libc.so.6 >> /tmp/abc

# ll -h /tmp/abc*
-rw-r--r-- 1 root root 9.1M Jan 1 00:02 /tmp/abc

# cp /tmp/abc /tmp/abc2
# cp /tmp/abc /tmp/abc3
# cp /tmp/abc /tmp/abc4

# ll -h /tmp/abc*
-rw-r--r-- 1 root root 9.1M Jan 1 00:02 /tmp/abc
-rw-r--r-- 1 root root 9.1M Jan 1 00:02 /tmp/abc2
-rw-r--r-- 1 root root 9.1M Jan 1 00:02 /tmp/abc3
-rw-r--r-- 1 root root 9.1M Jan 1 00:02 /tmp/abc4

# cp /tmp/abc /tmp/abc5



On 2.6.14, 2.6.18.2, and 2.6.19.2 I get this:


CPU 0 Unable to handle kernel paging request at virtual address 00000000,\
epc == 80159858, ra == 80159b70
Oops[#1]:
Cpu 0
$ 0 : 00000000 1000fc00 8f818018 810a1458
$ 4 : 00100100 00000000 00200200 803c2054
$ 8 : 0000475d 00000001 00000001 fffffff8
$12 : 0055a000 00000000 00001000 03200008
$16 : 803c2028 1000fc01 803c2048 803c2028
$20 : 00000000 00000000 000280d2 00000000
$24 : 00000000 2ac6342c
$28 : 804c6000 804c7d90 803c2550 80159b70
Hi : 307d2bf7
Lo : a60e8300
epc : 80159858 buffered_rmqueue+0x9c/0x1f4 Not tainted
ra : 80159b70 get_page_from_freelist+0x10c/0x134
Status: 1000fc02 KERNEL EXL
Cause : 0080000c
BadVA : 00000000
PrId : 03030200
Modules linked in: aes blowfish llc vfat fat
Process cp (pid: 853, threadinfo=804c6000, task=804d2500)
Stack : 804d2500 00000000 00000000 00000000 803c2550 00000000 00000000 00000000
00000044 00000000 00000004 80159b70 100d0eb0 87b54e60 00000001 802327fc
00000044 100cf050 803c2550 804d2500 000280d2 87c8abc8 00000000 00000000
803c2550 00000558 00000010 80159c00 000280d2 80170d28 801536ac 801549a0
00001000 386d4434 80704880 00000000 00000000 87c8abc8 87c8ac28 00000000
...
Call Trace:
[<80159b70>] get_page_from_freelist+0x10c/0x134
[<802327fc>] radix_tree_insert+0x190/0x1e0
[<80159c00>] __alloc_pages+0x68/0x2fc
[<80170d28>] shmem_swp_alloc+0xb4/0x254
[<801536ac>] unlock_page+0x68/0xd8
[<801549a0>] file_read_actor+0x0/0x100
[<80171d94>] shmem_getpage+0x234/0x65c
[<80172824>] shmem_file_write+0x160/0x2b0
[<80172d08>] shmem_file_read+0x70/0x80
[<8017a1d4>] vfs_write+0xd4/0x1a0
[<801340f0>] update_process_times+0x58/0x90
[<801340d4>] update_process_times+0x3c/0x90
[<8017a394>] sys_write+0x54/0xa0
[<8010d380>] stack_done+0x20/0x3c


Code: 8c650004 34840100 34c60200 <aca20000> ac450004 ac640000\
ac660004 8e020020 2472ffe8
Break instruction in kernel code[#2]:
Cpu 0
(((oops #2 follows)))

Code: 00000040 080579ca 8cc300d4 <0000800d> 080579bf 00000000\
27bdffe8 afb00010 afbf0014
Fixing recursive fault but reboot is needed!
Unhandled kernel unaligned access[#3]:
Cpu 0
(((oops #3 follows)))

Code: 30630001 14600044 00000000 <8c820000> 000211c2 30420001\
1040003c 00000000 8c90001c
Break instruction in kernel code[#4]:
Cpu 0
(((oops #4 follows)))

Code: 0000800d 0805d97b 00000000 <0000800d> 0805d979 8ca20018\
0805d973 8ca5000c 27bdffe0
Kernel panic - not syncing: Attempted to kill init!
Break instruction in kernel code[#5]:
Cpu 0
(((oops #5 follows)))

Code: 8fb00010 03e00008 27bd0020 <0000800d> 0805d9c7 8c840018\
0805d9c1 8c84000c 03e00008
Kernel panic - not syncing: Aiee, killing interrupt handler!
(((no more messages)))



On 2.6.21-rc7 I get this:

Bad page state in process 'hotplug'
page:8100cd80 flags:0x00000000 mapping:00000000 mapcount:1 count:0
Trying to fix it up, but a reboot is needed
Backtrace:
Call Trace:
[<80109964>] dump_stack+0x8/0x34
[<8015aa1c>] bad_page+0x6c/0xb0
[<8015ba50>] free_hot_cold_page+0x1b8/0x1c8
[<8015ba70>] free_hot_page+0x10/0x1c
[<8015ff38>] __page_cache_release+0x138/0x2ec
[<8016020c>] put_page+0x68/0xd4
[<80157398>] filemap_nopage+0x368/0x578
[<8016b930>] do_no_page+0x90/0x470
[<8016c118>] __handle_mm_fault+0x168/0x3dc
[<8010e2c0>] do_page_fault+0x100/0x360
[<80103a80>] ret_from_exception+0x0/0x20

Bad page state in process 'hotplug'
page:8100cd80 flags:0x00080000 mapping:00000000 mapcount:0 count:0
Trying to fix it up, but a reboot is needed
(((same backtrace as before, then
CPU 0 Unable to handle kernel paging request at virtual address\
00100104, epc == 8015adc4, ra == 8015ba34
(((further messages)))



Some thoughts on this:

- I can reproduce the oops whether root is a JFFS2 partition or
NFS-root, so I expect that no MTD/JFFS2-stuff is involved.

- Before I used 40M of tmpfs, I had 64M (of the 128M of RAM). Filling
that tmpfs partition lead to the oops when about 40M were reached.
Using 32M of tmpfs leads to "No space left on device" (as was seen
for kernel 2.6.21-rc7). But inserting a USB-stick can already
provoke the oops after "No space left"...

- I tried (a) ramfs and (b) a large ramdisk instead of tmpfs, as I
expected that the missing swap (as backing store) for tmpfs to be the
reason. But both lead to oopses. So I expect that the missing swap
is not the reason.

- Since swap seems to not be involved, I expect that commit
6ebba0e2f56ee77270a9ef8e92c1b4ec38e5f419 ([MIPS] Fix swap entry for
MIPS32 36-bit physical address) is not involved either.

- I could not reproduce the oopses on a x86 system, so I think this
could/should be mips-specific.



Does someone have an idea on how to fix this?


Thank you

Elmar

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/