Re: Kernel mm/rmap.c oops in 2.6.11.3

From: Chris Wright
Date: Wed Mar 16 2005 - 17:11:56 EST


* Max Kamenetsky (maxk@xxxxxxxxxxxxxxxxxxxx) wrote:
> I've been seeing the following bug lately when running some memory- and
> CPU-intensive MATLAB jobs. MATLAB hangs, and commands like ps and top
> no longer work. The only solution I've found is to reboot. This
> happens intermittently, and here's what gets written to /var/log/syslog:

Questions...

1) Did you run memtest86 to make sure it's not bad memory?
2) Can it be reproduced on untainted kernel (no nvidia)?
3) Hugh, did you have a debug patch that Max could use that might help
chase this particular one (I think it's another of the 'exclusive club')?

> Mar 16 12:35:19 chinook kernel: kernel BUG at mm/rmap.c:482!
> Mar 16 12:35:19 chinook kernel: invalid operand: 0000 [#1]
> Mar 16 12:35:19 chinook kernel: PREEMPT
> Mar 16 12:35:19 chinook kernel: Modules linked in: nvidia
> Mar 16 12:35:19 chinook kernel: CPU: 0
> Mar 16 12:35:19 chinook kernel: EIP: 0060:[<c014a477>] Tainted: P
> VLI
> Mar 16 12:35:19 chinook kernel: EFLAGS: 00010286 (2.6.11.3)
> Mar 16 12:35:19 chinook kernel: EIP is at page_remove_rmap+0x37/0x50
> Mar 16 12:35:19 chinook kernel: eax: ffffffff ebx: 00005000 ecx:
> 00000006
> edx: c16a9920
> Mar 16 12:35:19 chinook kernel: esi: e3db1e34 edi: 00008000 ebp:
> c16a9920
> esp: c8f4be54
> Mar 16 12:35:19 chinook kernel: ds: 007b es: 007b ss: 0068
> Mar 16 12:35:19 chinook kernel: Process MATLAB (pid: 30685,
> threadinfo=c8f4a000
> task=ec1a9a80)
> Mar 16 12:35:19 chinook kernel: Stack: c013e418 00005000 c0142ed6
> c16a9920 00000
> 007 c0565a20 00000001 354c9067
> Mar 16 12:35:19 chinook kernel: 00000000 99388000 c0565578
> 99788000 e0f80
> 994 99390000 00000000 c0143043
> Mar 16 12:35:19 chinook kernel: c0565578 e0f80990 99388000
> 00008000 00000
> 000 99388000 e0f80994 99390000
> Mar 16 12:35:19 chinook kernel: Call Trace:
> Mar 16 12:35:19 chinook kernel: [<c013e418>] mark_page_accessed+0x28/0x30
> Mar 16 12:35:19 chinook kernel: [<c0142ed6>] zap_pte_range+0x166/0x280
> Mar 16 12:35:19 chinook kernel: [<c0143043>] zap_pmd_range+0x53/0x70
> Mar 16 12:35:19 chinook kernel: [<c014309a>] zap_pud_range+0x3a/0x60
> Mar 16 12:35:19 chinook kernel: [<c0143130>] unmap_page_range+0x70/0x90
> Mar 16 12:35:19 chinook kernel: [<c0143246>] unmap_vmas+0xf6/0x210
> Mar 16 12:35:19 chinook kernel: [<c0147bbb>] unmap_region+0x7b/0xf0
> Mar 16 12:35:19 chinook kernel: [<c0147ea6>] do_munmap+0x116/0x180
> Mar 16 12:35:19 chinook kernel: [<c0147f54>] sys_munmap+0x44/0x70
> Mar 16 12:35:19 chinook kernel: [<c01027db>] syscall_call+0x7/0xb
> Mar 16 12:35:19 chinook kernel: Code: 75 33 83 42 08 ff 0f 98 c0 84 c0
> 74 1a 8b
> 42 08 40 78 18 c7 44 24 04 ff ff ff ff c7 04 24 10 00 00 00 e8 9d f5 fe
> ff 83 c4
> 08 c3 <0f> 0b e2 01 7d a1 42 c0 eb de 0f 0b df 01 7d a1 42 c0 eb c3 90
> Mar 16 12:35:19 chinook kernel: <6>note: MATLAB[30685] exited with
> preempt_count 2
> Mar 16 12:35:19 chinook kernel: scheduling while atomic:
> MATLAB/0x00000002/30685
> Mar 16 12:35:19 chinook kernel: [<c040d3a2>] schedule+0x522/0x530
> Mar 16 12:35:19 chinook kernel: [<c040e19d>]
> rwsem_down_read_failed+0x9d/0x190
> Mar 16 12:35:19 chinook kernel: [<c012d414>] .text.lock.futex+0x7/0xf3
> Mar 16 12:35:19 chinook kernel: [<c02a6e80>] vt_console_print+0x60/0x300
> Mar 16 12:35:19 chinook kernel: [<c012d2b4>] do_futex+0x64/0xa0
> Mar 16 12:35:19 chinook kernel: [<c0117527>]
> __call_console_drivers+0x57/0x60
> Mar 16 12:35:19 chinook kernel: [<c012d3de>] sys_futex+0xee/0x100
> Mar 16 12:35:19 chinook kernel: [<c0117a58>] release_console_sem+0x98/0xf0
> Mar 16 12:35:19 chinook kernel: [<c0115178>] mm_release+0x98/0xa0
> Mar 16 12:35:19 chinook kernel: [<c0118ed9>] exit_mm+0x19/0x110
> Mar 16 12:35:19 chinook kernel: [<c0103d60>] do_invalid_op+0x0/0xd0
> Mar 16 12:35:19 chinook kernel: [<c0119910>] do_exit+0xa0/0x3d0
> Mar 16 12:35:19 chinook kernel: [<c0103d60>] do_invalid_op+0x0/0xd0
> Mar 16 12:35:19 chinook kernel: [<c010399d>] die+0x18d/0x190
> Mar 16 12:35:19 chinook kernel: [<c0103e0e>] do_invalid_op+0xae/0xd0
> Mar 16 12:35:19 chinook kernel: [<c014a477>] page_remove_rmap+0x37/0x50
> Mar 16 12:35:19 chinook kernel: [<c012872b>]
> rcu_process_callbacks+0x3b/0x40
> Mar 16 12:35:19 chinook kernel: [<c011c416>] tasklet_action+0x46/0x70
> Mar 16 12:35:19 chinook kernel: [<c011c1b8>] __do_softirq+0x78/0x90
> Mar 16 12:35:19 chinook kernel: [<c0104ba8>] do_IRQ+0x28/0x40
> Mar 16 12:35:19 chinook kernel: [<c0177691>] __mark_inode_dirty+0xd1/0x1c0
> Mar 16 12:35:19 chinook kernel: [<c01031ef>] error_code+0x2b/0x30
> Mar 16 12:35:19 chinook kernel: [<c014a477>] page_remove_rmap+0x37/0x50
> Mar 16 12:35:19 chinook kernel: [<c013e418>] mark_page_accessed+0x28/0x30
> Mar 16 12:35:19 chinook kernel: [<c0142ed6>] zap_pte_range+0x166/0x280
> Mar 16 12:35:19 chinook kernel: [<c0143043>] zap_pmd_range+0x53/0x70
> Mar 16 12:35:19 chinook kernel: [<c014309a>] zap_pud_range+0x3a/0x60
> Mar 16 12:35:19 chinook kernel: [<c0143130>] unmap_page_range+0x70/0x90
> Mar 16 12:35:19 chinook kernel: [<c0143246>] unmap_vmas+0xf6/0x210
> Mar 16 12:35:19 chinook kernel: [<c0147bbb>] unmap_region+0x7b/0xf0
> Mar 16 12:35:19 chinook kernel: [<c0147ea6>] do_munmap+0x116/0x180
> Mar 16 12:35:19 chinook kernel: [<c0147f54>] sys_munmap+0x44/0x70
> Mar 16 12:35:19 chinook kernel: [<c01027db>] syscall_call+0x7/0xb
>
>
> I haven't tried 2.6.11.4 yet, but based on what I see in the changelog,
> nothing related to the above seems to have been changed.

This particular problem predates 2.6.11, and you're right 2.6.11.4
shouldn't make a difference.

thanks,
-chris
--
Linux Security Modules http://lsm.immunix.org http://lsm.bkbits.net
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/