Re: 3.2.0-rc5 NULL dereference BUG

From: Wu Fengguang
Date: Wed Jan 04 2012 - 20:56:33 EST


On Sun, Dec 18, 2011 at 07:32:37PM +0800, Wu Fengguang wrote:
> Yongqiang,
>
> Thanks for the quick fix!
>
> On Sun, Dec 18, 2011 at 03:17:18PM +0800, Yongqiang Yang wrote:
> > Hi Fengguang,
> >
> > Could you try the patch [ext4: do not reference pa_inode from group_pa]?
>
> It works! You can add my tested-by and CC stable.

The patch seems to only fix part of the problem. Today I get this slightly
different dmesg (the kernel has been patched with [ext4: do not reference
pa_inode from group_pa]):

[ 646.026574] BUG: unable to handle kernel NULL pointer dereference at 0000000000000178
[ 646.027004] IP: [<ffffffff810a5092>] __lock_acquire+0x8b/0x932
[ 646.027004] PGD 4f85067 PUD 99cb4067 PMD 0
[ 646.027004] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
[ 646.027004] CPU 6
[ 646.051405] Modules linked in:
[ 646.051405]
[ 646.051405] Pid: 6149, comm: dd Not tainted 3.2.0-rc5-ioless-full+ #1009 Supermicro X7DW3/X7DWN
[ 646.051405] RIP: 0010:[<ffffffff810a5092>] [<ffffffff810a5092>] __lock_acquire+0x8b/0x932
[ 646.051405] RSP: 0018:ffff880004ee18d8 EFLAGS: 00010097
[ 646.051405] RAX: 0000000000000000 RBX: 0000000000000170 RCX: 0000000000000000
[ 646.051405] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000170
[ 646.051405] RBP: ffff880004ee1948 R08: 0000000000000000 R09: 0000000000000000
[ 646.051405] R10: 0000000000000170 R11: ffffffff81175de4 R12: 0000000000000000
[ 646.051405] R13: 0000000000000000 R14: ffff880004fc4540 R15: 0000000000000000
[ 646.051405] FS: 00007f193aa90700(0000) GS:ffff880226a00000(0000) knlGS:0000000000000000
[ 646.051405] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 646.051405] CR2: 0000000000000178 CR3: 00000000b17cb000 CR4: 00000000000006e0
[ 646.051405] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 646.051405] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 646.051405] Process dd (pid: 6149, threadinfo ffff880004ee0000, task ffff880004fc4540)
[ 646.051405] Stack:
[ 646.051405] ffff880004ee18f8 ffffffff81099aa3 0000000000000006 0000000000000002
[ 646.051405] 0000000000000000 0000000000008010 ffff880225806b00 ffff88005fc08d68
[ 646.051405] ffff880004ee1978 0000000000000000 0000000000000170 0000000000000000
[ 646.051405] Call Trace:
[ 646.051405] [<ffffffff81099aa3>] ? sched_clock_local+0x12/0x75
[ 646.051405] [<ffffffff810a5a16>] lock_acquire+0xdd/0x10a
[ 646.051405] [<ffffffff81175de4>] ? create_empty_buffers+0x4a/0xc1
[ 646.051405] [<ffffffff8199f623>] _raw_spin_lock+0x36/0x69
[ 646.051405] [<ffffffff81175de4>] ? create_empty_buffers+0x4a/0xc1
[ 646.051405] [<ffffffff81175de4>] create_empty_buffers+0x4a/0xc1
[ 646.051405] [<ffffffff811efd2f>] ext4_discard_partial_page_buffers_no_lock+0x9f/0x406
[ 646.051405] [<ffffffff8199ffeb>] ? _raw_spin_unlock+0x2b/0x2f
[ 646.051405] [<ffffffff81170c26>] ? __mark_inode_dirty+0x1ac/0x1cc
[ 646.051405] [<ffffffff811767f3>] ? generic_write_end+0x6d/0x7f
[ 646.051405] [<ffffffff811f15e5>] ext4_da_write_end+0x244/0x2ed
[ 646.051405] [<ffffffff810ffeec>] generic_file_buffered_write+0x183/0x22d
[ 646.051405] [<ffffffff8107946a>] ? current_fs_time+0x27/0x2e
[ 646.051405] [<ffffffff8110198c>] __generic_file_aio_write+0x334/0x364
[ 646.051405] [<ffffffff8199e55c>] ? mutex_lock_nested+0x2e2/0x2f1
[ 646.051405] [<ffffffff81101a06>] ? generic_file_aio_write+0x4a/0xc1
[ 646.051405] [<ffffffff81101a22>] generic_file_aio_write+0x66/0xc1
[ 646.051405] [<ffffffff811ea020>] ext4_file_write+0x1f9/0x251
[ 646.051405] [<ffffffff8103c24b>] ? sched_clock+0x9/0xd
[ 646.051405] [<ffffffff8118180e>] ? fsnotify+0x216/0x26f
[ 646.051405] [<ffffffff8114d45e>] do_sync_write+0xce/0x10b
[ 646.051405] [<ffffffff8118180e>] ? fsnotify+0x216/0x26f
[ 646.051405] [<ffffffff8118166e>] ? fsnotify+0x76/0x26f
[ 646.051405] [<ffffffff8114dc1b>] vfs_write+0xb8/0x157
[ 646.051405] [<ffffffff8114ded2>] sys_write+0x4d/0x77
[ 646.051405] [<ffffffff819a6c02>] system_call_fastpath+0x16/0x1b
[ 646.051405] Code: bd 08 00 00 be d5 0b 00 00 48 c7 c7 86 41 d3 81 83 3d 82 f2 9f 01 00 0f 85 a4 08 00 00 e9 bb 03 00 00 41 83 fc 01 77 13 44 89 e0 <4c> 8b 6c c3 08 4d 85 ed 0f 85 5b 03 00 00 eb 34 41 83 fc 07 76
[ 646.051405] RIP [<ffffffff810a5092>] __lock_acquire+0x8b/0x932
[ 646.051405] RSP <ffff880004ee18d8>
[ 646.051405] CR2: 0000000000000178
[ 646.051405] ---[ end trace ebd0c8e3a842a6f1 ]---

The test case is about running 100 dd tasks on each of the 10 JBOD disks:

lkp-st02-x8664/JBOD-10HDD-thresh=100M/ext4-100dd-1-3.2.0-rc5-ioless-full+

Thanks,
Fengguang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/