Re: Kernel 3.0: Instant kernel crash when mounting CIFS (also crasheswith linux-3.1-rc2

From: Justin Piszcz
Date: Wed Aug 17 2011 - 17:53:11 EST




On Wed, 17 Aug 2011, Arnaud Lacombe wrote:

Hi,

On Wed, Aug 17, 2011 at 4:45 PM, Justin Piszcz <jpiszcz@xxxxxxxxxxxxxxx> wrote:


On Wed, 17 Aug 2011, Jeff Layton wrote:

The crash is happening in the bowels of the slab allocator.
Specifically, it looks like it's hitting this:

              /*
               * The slab was either on partial or free list so
               * there must be at least one object available for
               * allocation.
               */
              BUG_ON(slabp->inuse >= cachep->num);

...which looks like maybe the accounting of in-use objects is off. This
really sounds like some sort of memory corruption. I've not been able
to reproduce this so far, but I also had someone report panic here that
might be related:

  https://bugzilla.redhat.com/show_bug.cgi?id=731278

One thing that might be helpful is turning on page poisoning and
redoing this test, that might make it crash sooner and point out the
source of the corruption.

Even better would be a bisect to track down the cause...


Hi Jeff,

root@acerlw:/usr/src/linux# grep CONFIG_PAGE_POISONING .config
root@acerlw:/usr/src/linux# ls -l ../linux
lrwxrwxrwx 1 root root 13 Aug 17 14:41 ../linux -> linux-3.1-rc2/
root@acerlw:/usr/src/linux#

In what kernel is that feature available, or, how do I enable it?

It is selected by "Kernel hacking" -> "Debug page memory allocations",
provided your arch support pagealloc debug.

- Arnaud

Hi,

Thanks, a larger dump below with that option enabled:

[ 478.103032] general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC
[ 478.103049] CPU 1 [ 478.103052] Modules linked in: bnep rfcomm bluetooth speedstep_lib cryptd aes_x86_64 aes_generic configfs ohci_hcd ssb ath9k mac80211 uvcvideo ath9k_common ath9k_hw ath videodev mmc_core video edac_core k10temp edac_mce_amd v4l2_compat_ioctl32 i2c_piix4 battery cfg80211 ac pcmcia shpchp pci_hotplug wmi pcmcia_core rfkill
[ 478.103107] [ 478.103113] Pid: 3978, comm: echo Not tainted 3.1.0-rc2 #3 Acer Aspire 7551 /Aspire 7551 [ 478.103126] RIP: 0010:[<ffffffff8134e839>] [<ffffffff8134e839>] tty_paranoia_check+0x9/0x70
[ 478.103144] RSP: 0018:ffff88012e0f1e88 EFLAGS: 00010282
[ 478.103150] RAX: ffff88013b65d740 RBX: 000000000000002a RCX: ffff88012e0f1f48
[ 478.103155] RDX: ffffffff8199c18c RSI: ffff88013b7da490 RDI: 9440ffff88013273
[ 478.103161] RBP: ffff88012e0f1e88 R08: 00007fcc4da01700 R09: ffff88013b7da490
[ 478.103166] R10: 0000000000000000 R11: 0000000000000246 R12: 9440ffff88013273
[ 478.103172] R13: 00007fcc4da0e000 R14: ffff8801388e7bc0 R15: ffff8801388e7bc0
[ 478.103179] FS: 00007fcc4da01700(0000) GS:ffff88013fc80000(0000) knlGS:0000000000000000
[ 478.103185] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 478.103190] CR2: 00007fcc4d538380 CR3: 000000013277c000 CR4: 00000000000006e0
[ 478.103195] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 478.103201] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 478.103207] Process echo (pid: 3978, threadinfo ffff88012e0f0000, task ffff88012e15a150)
[ 478.103212] Stack:
[ 478.103215] ffff88012e0f1ef8 ffffffff8134f17b 0000000000000022 00000007fcc4da0e
[ 478.103227] ffff88012e0f1f18 ffffffff8109f2c7 000000002308a472 ffff88013a9df800
[ 478.103237] 0000000000000003 000000000000002a 00007fcc4da0e000 ffff88012e0f1f48
[ 478.103246] Call Trace:
[ 478.103255] [<ffffffff8134f17b>] tty_write+0x3b/0x290
[ 478.103266] [<ffffffff8109f2c7>] ? do_mmap_pgoff+0x357/0x370
[ 478.103274] [<ffffffff810b6e6a>] vfs_write+0xaa/0x160
[ 478.103280] [<ffffffff810b7155>] sys_write+0x45/0x90
[ 478.103290] [<ffffffff8164debb>] system_call_fastpath+0x16/0x1b
[ 478.103295] Code: 00 00 00 00 00 48 89 df e8 c5 f0 d5 ff 48 8b 5d f0 4c 8b 65 f8 c9 c3 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 85 ff 48 89 e5 74 0c [ 478.103336] 3f 01 54 00 00 75 2b 31 c0 5d c3 8b 76 44 48 89 d1 48 c7 c7 [ 478.103357] RIP [<ffffffff8134e839>] tty_paranoia_check+0x9/0x70
[ 478.103366] RSP <ffff88012e0f1e88>
[ 478.103372] ---[ end trace df8e9f10dc5e941d ]---
[ 478.103700] general protection fault: 0000 [#2] SMP DEBUG_PAGEALLOC
[ 478.103711] CPU 0 [ 478.103715] Modules linked in: bnep rfcomm bluetooth speedstep_lib cryptd aes_x86_64 aes_generic configfs ohci_hcd ssb ath9k mac80211 uvcvideo ath9k_common ath9k_hw ath videodev mmc_core video edac_core k10temp edac_mce_amd v4l2_compat_ioctl32 i2c_piix4 battery cfg80211 ac pcmcia shpchp pci_hotplug wmi pcmcia_core rfkill
[ 478.103766] [ 478.103772] Pid: 3933, comm: atd Tainted: G D 3.1.0-rc2 #3 Acer Aspire 7551 /Aspire 7551 [ 478.103785] RIP: 0010:[<ffffffff8134e839>] [<ffffffff8134e839>] tty_paranoia_check+0x9/0x70
[ 478.103803] RSP: 0018:ffff880139749e88 EFLAGS: 00010282
[ 478.103808] RAX: ffff88013b65d740 RBX: 0000000000000013 RCX: ffff880139749f48
[ 478.103814] RDX: ffffffff8199c18c RSI: ffff88013b7da490 RDI: 9440ffff88013273
[ 478.103820] RBP: ffff880139749e88 R08: 0000000000000000 R09: ffff88013b7da490
[ 478.103825] R10: 0000000000000000 R11: 0000000000000246 R12: 9440ffff88013273
[ 478.103831] R13: 00007fff04275cb0 R14: ffff8801388e7bc0 R15: ffff8801388e7bc0
[ 478.103838] FS: 00007fd613e52700(0000) GS:ffff88013fc00000(0000) knlGS:0000000000000000
[ 478.103844] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 478.103849] CR2: 00007fd613a051d5 CR3: 000000013a034000 CR4: 00000000000006f0
[ 478.103854] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 478.103859] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 478.103866] Process atd (pid: 3933, threadinfo ffff880139748000, task ffff88013f2c3910)
[ 478.103870] Stack:
[ 478.103873] ffff880139749ef8 ffffffff8134f17b 0000000000000f8a 0000000000000000
[ 478.103885] ffffffff810349fd 00007fff04275cec 0000000000000004 0000000000000000
[ 478.103894] 0000000000000000 0000000000000013 00007fff04275cb0 ffff880139749f48[ 478.103903] Call Trace:
[ 478.103913] [<ffffffff8134f17b>] tty_write+0x3b/0x290
[ 478.103924] [<ffffffff810349fd>] ? do_fork+0x13d/0x210
[ 478.103932] [<ffffffff810b6e6a>] vfs_write+0xaa/0x160
[ 478.103938] [<ffffffff810b7155>] sys_write+0x45/0x90
[ 478.103948] [<ffffffff8164debb>] system_call_fastpath+0x16/0x1b
[ 478.103953] Code: 00 00 00 00 00 48 89 df e8 c5 f0 d5 ff 48 8b 5d f0 4c 8b 65 f8 c9 c3 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 85 ff 48 89 e5 74 0c [ 478.103995] 3f 01 54 00 00 75 2b 31 c0 5d c3 8b 76 44 48 89 d1 48 c7 c7 [ 478.104016] RIP [<ffffffff8134e839>] tty_paranoia_check+0x9/0x70
[ 478.104025] RSP <ffff880139749e88>
[ 478.104072] ---[ end trace df8e9f10dc5e941e ]---
[ 478.104333] general protection fault: 0000 [#3] SMP DEBUG_PAGEALLOC
[ 478.104352] CPU 0 [ 478.104357] Modules linked in: bnep rfcomm bluetooth speedstep_lib cryptd aes_x86_64 aes_generic configfs ohci_hcd ssb ath9k mac80211 uvcvideo ath9k_common ath9k_hw ath videodev mmc_core video edac_core k10temp edac_mce_amd v4l2_compat_ioctl32 i2c_piix4 battery cfg80211 ac pcmcia shpchp pci_hotplug wmi pcmcia_core rfkill
[ 478.104405] [ 478.104410] Pid: 3933, comm: atd Tainted: G D 3.1.0-rc2 #3 Acer Aspire 7551 /Aspire 7551 [ 478.104422] RIP: 0010:[<ffffffff8134e839>] [<ffffffff8134e839>] tty_paranoia_check+0x9/0x70
[ 478.104434] RSP: 0018:ffff880139749b28 EFLAGS: 00010282
[ 478.104439] RAX: ffff88013f344382 RBX: 9440ffff88013273 RCX: 0000000000000000
[ 478.104444] RDX: ffffffff8199c20d RSI: ffff88013b7da490 RDI: 9440ffff88013273
[ 478.104450] RBP: ffff880139749b28 R08: 0000000000000000 R09: 0000000000000000
[ 478.104455] R10: ffff8801388e7bd0 R11: 0000000000000001 R12: 0000000000000008
[ 478.104460] R13: ffff8801388e7bc0 R14: ffff88013b65d740 R15: ffff88013b7da490
[ 478.104467] FS: 00007fd613e52700(0000) GS:ffff88013fc00000(0000) knlGS:0000000000000000
[ 478.104473] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 478.104478] CR2: 00007fd613a051d5 CR3: 0000000001c1d000 CR4: 00000000000006f0
[ 478.104483] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 478.104488] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 478.104494] Process atd (pid: 3933, threadinfo ffff880139748000, task ffff88013f2c3910)
[ 478.104498] Stack:
[ 478.104501] ffff880139749bd8 ffffffff8134fe31 ffff880139749b48 ffffffff810d259a
[ 478.104512] ffff880139749b98 ffffffff810b85df ffff88012e08b288 ffff88013a154390
[ 478.104520] 0000000000000000 ffff8801388e7bd0 ffff88013b7da490 0000000800000001
[ 478.104529] Call Trace:
[ 478.104538] [<ffffffff8134fe31>] tty_release+0x41/0x550
[ 478.104546] [<ffffffff810d259a>] ? mntput+0x1a/0x30
[ 478.104554] [<ffffffff810b85df>] ? fput+0x15f/0x200
[ 478.104561] [<ffffffff810b8552>] fput+0xd2/0x200
[ 478.104570] [<ffffffff810b50c1>] filp_close+0x61/0x90
[ 478.104578] [<ffffffff810384bf>] put_files_struct+0x7f/0xe0
[ 478.104585] [<ffffffff810385c4>] exit_files+0x44/0x50
[ 478.104591] [<ffffffff81038bc4>] do_exit+0x5f4/0x790
[ 478.104600] [<ffffffff81036e94>] ? kmsg_dump+0x44/0xe0
[ 478.104609] [<ffffffff81004f25>] oops_end+0x75/0xa0
[ 478.104615] [<ffffffff81005093>] die+0x53/0x80
[ 478.104623] [<ffffffff81002854>] do_general_protection+0x154/0x160
[ 478.104631] [<ffffffff8164da7f>] general_protection+0x1f/0x30
[ 478.104641] [<ffffffff8134e839>] ? tty_paranoia_check+0x9/0x70
[ 478.104649] [<ffffffff8134f17b>] tty_write+0x3b/0x290
[ 478.104657] [<ffffffff810349fd>] ? do_fork+0x13d/0x210
[ 478.104664] [<ffffffff810b6e6a>] vfs_write+0xaa/0x160
[ 478.104670] [<ffffffff810b7155>] sys_write+0x45/0x90
[ 478.104679] [<ffffffff8164debb>] system_call_fastpath+0x16/0x1b
[ 478.104683] Code: 00 00 00 00 00 48 89 df e8 c5 f0 d5 ff 48 8b 5d f0 4c 8b 65 f8 c9 c3 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 85 ff 48 89 e5 74 0c [ 478.104724] 3f 01 54 00 00 75 2b 31 c0 5d c3 8b 76 44 48 89 d1 48 c7 c7 [ 478.104744] RIP [<ffffffff8134e839>] tty_paranoia_check+0x9/0x70
[ 478.104753] RSP <ffff880139749b28>
[ 478.104757] ---[ end trace df8e9f10dc5e941f ]---
[ 478.104761] Fixing recursive fault but reboot is needed!
[ 478.152105] general protection fault: 0000 [#4] SMP DEBUG_PAGEALLOC
[ 478.152117] CPU 1 [ 478.152120] Modules linked in: bnep rfcomm bluetooth speedstep_lib cryptd aes_x86_64 aes_generic configfs ohci_hcd ssb ath9k mac80211 uvcvideo ath9k_common ath9k_hw ath videodev mmc_core video edac_core k10temp edac_mce_amd v4l2_compat_ioctl32 i2c_piix4 battery cfg80211 ac pcmcia shpchp pci_hotplug wmi pcmcia_core rfkill
[ 478.152171] [ 478.152177] Pid: 3936, comm: danted Tainted: G D 3.1.0-rc2 #3 Acer Aspire 7551 /Aspire 7551 [ 478.152190] RIP: 0010:[<ffffffff8134e839>] [<ffffffff8134e839>] tty_paranoia_check+0x9/0x70
[ 478.152208] RSP: 0018:ffff880138abdd18 EFLAGS: 00010286
[ 478.152213] RAX: ffff88013f344300 RBX: 88012e1400003000 RCX: 0000000000000000
[ 478.152219] RDX: ffffffff8199c20d RSI: ffff88013b5732f0 RDI: 88012e1400003000
[ 478.152224] RBP: ffff880138abdd18 R08: 0000000000000000 R09: 0000000000000000
[ 478.152229] R10: ffff8801388e7150 R11: 0000000000000001 R12: 0000000000000008
[ 478.152235] R13: ffff8801388e7140 R14: ffff88012fa19d80 R15: ffff88013b5732f0
[ 478.152241] FS: 00007fe3e8099700(0000) GS:ffff88013fc80000(0000) knlGS:0000000000000000
[ 478.152247] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 478.152253] CR2: 00007fe3e7bad520 CR3: 0000000001c1d000 CR4: 00000000000006e0
[ 478.152258] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 478.152263] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 478.152269] Process danted (pid: 3936, threadinfo ffff880138abc000, task ffff88012e0c1810)
[ 478.152274] Stack:
[ 478.152277] ffff880138abddc8 ffffffff8134fe31 ffff880138abdd38 ffffffff810d259a
[ 478.152288] ffff880138abdd88 ffffffff810b85df ffff8801326ea3d8 ffff8801325a5250
[ 478.152297] 0000000000000000 ffff8801388e7150 ffff88013b5732f0 0000000800000001
[ 478.152307] Call Trace:
[ 478.152317] [<ffffffff8134fe31>] tty_release+0x41/0x550
[ 478.152326] [<ffffffff810d259a>] ? mntput+0x1a/0x30
[ 478.152334] [<ffffffff810b85df>] ? fput+0x15f/0x200
[ 478.152341] [<ffffffff810b8552>] fput+0xd2/0x200
[ 478.152350] [<ffffffff810b50c1>] filp_close+0x61/0x90
[ 478.152358] [<ffffffff810384bf>] put_files_struct+0x7f/0xe0
[ 478.152365] [<ffffffff810385c4>] exit_files+0x44/0x50
[ 478.152372] [<ffffffff81038bc4>] do_exit+0x5f4/0x790
[ 478.152380] [<ffffffff810baf46>] ? vfs_stat+0x16/0x20
[ 478.152387] [<ffffffff810bb285>] ? sys_newstat+0x15/0x30
[ 478.152394] [<ffffffff810b7040>] ? vfs_read+0x120/0x160
[ 478.152402] [<ffffffff8103908f>] do_group_exit+0x3f/0xa0
[ 478.152409] [<ffffffff81039102>] sys_exit_group+0x12/0x20
[ 478.152418] [<ffffffff8164debb>] system_call_fastpath+0x16/0x1b
[ 478.152423] Code: 00 00 00 00 00 48 89 df e8 c5 f0 d5 ff 48 8b 5d f0 4c 8b 65 f8 c9 c3 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 85 ff 48 89 e5 74 0c [ 478.152464] 3f 01 54 00 00 75 2b 31 c0 5d c3 8b 76 44 48 89 d1 48 c7 c7 [ 478.152485] RIP [<ffffffff8134e839>] tty_paranoia_check+0x9/0x70
[ 478.152495] RSP <ffff880138abdd18>
[ 478.152501] ---[ end trace df8e9f10dc5e9420 ]---
[ 478.152505] Fixing recursive fault but reboot is needed!

Justin.