Re: kernel BUG at mm/filemap.c:238! (4.4.0-rc4)

From: Jan Stancek
Date: Mon Dec 07 2015 - 10:19:07 EST


On 12/07/2015 12:40 PM, Jan Stancek wrote:
> Hi,
>
> "ADSP018" test from LTP[1] is triggering BUG_ON below reliably for me on 4.4.0-rc4.
> I'll start a bisect - if someone already sees a suspect/culprit that could narrow
> it down, please let me know.
>
> # ./aiodio_sparse -i 4 -a 8k -w 16384k -s 65536k -n 2
> aiodio_sparse 0 TINFO : Dirtying free blocks
> aiodio_sparse 0 TINFO : Starting I/O tests
> aiodio_sparse 0 TINFO : Killing childrens(s)
>
> [ 637.250251] ------------[ cut here ]------------
> [ 637.255404] kernel BUG at mm/filemap.c:238!
> [ 637.260069] invalid opcode: 0000 [#1] SMP
> [ 637.264655] Modules linked in: loop x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul iTCO_wdt iTCO_vendor_support ipmi_devintf ppdev aesni_intel lrw gf128mul glue_helper ablk_helper cryptd ipmi_ssif lpc_ich sg pcspkr shpchp i2c_i801 mfd_core ipmi_si winbond_cir parport_pc rc_core parport ipmi_msghandler video nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c sr_mod sd_mod cdrom mgag200 drm_kms_helper igb syscopyarea sysfillrect sysimgblt ptp fb_sys_fops pps_core ttm dca i2c_algo_bit drm ahci libahci crc32c_intel libata i2c_core dm_mirror dm_region_hash dm_log dm_mod
> [ 637.328054] CPU: 6 PID: 22523 Comm: aiodio_sparse Not tainted 4.4.0-rc4 #1
> [ 637.335723] Hardware name: Intel Corporation S1200RP/S1200RP, BIOS S1200RP.86B.03.01.0002.041520151123 04/15/2015
> [ 637.347173] task: ffff880437fab200 ti: ffff8804379f4000 task.ti: ffff8804379f4000
> [ 637.355522] RIP: 0010:[<ffffffff811cd141>] [<ffffffff811cd141>] delete_from_page_cache+0x81/0x90
> [ 637.365433] RSP: 0018:ffff8804379f7978 EFLAGS: 00010246
> [ 637.371358] RAX: 002fffff80020028 RBX: ffffea000fe71c40 RCX: 0000000000000000
> [ 637.379319] RDX: ffff88043e410220 RSI: 0000000000000000 RDI: ffffea000fe71c40
> [ 637.387280] RBP: ffff8804379f79a0 R08: 0000000000000000 R09: 0000000000000001
> [ 637.395241] R10: 0000000000000000 R11: 0000000000000001 R12: ffff880430b543b8
> [ 637.403202] R13: ffff8804379f79f0 R14: 0000000000000964 R15: 0000000000000000
> [ 637.411161] FS: 00007fd344bab740(0000) GS:ffff88043e400000(0000) knlGS:0000000000000000
> [ 637.420188] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 637.426598] CR2: 00007ffc755f8aef CR3: 0000000001ad6000 CR4: 00000000003406e0
> [ 637.434560] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 637.442518] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [ 637.450479] Stack:
> [ 637.452718] ffffea000fe71c40 ffff880430b543b8 ffff8804379f79f0 0000000000000964
> [ 637.461009] 0000000000000000 ffff8804379f79c0 ffffffff811dee66 ffffffffffffffff
> [ 637.469299] ffff8804379f7a60 ffff8804379f7b10 ffffffff811df2cb 0000000000000000
> [ 637.477590] Call Trace:
> [ 637.480316] [<ffffffff811dee66>] truncate_inode_page+0x56/0x90
> [ 637.486922] [<ffffffff811df2cb>] truncate_inode_pages_range+0x3eb/0x760
> [ 637.494399] [<ffffffff811df6ac>] truncate_inode_pages_final+0x4c/0x60
> [ 637.501695] [<ffffffffa02b2537>] xfs_fs_evict_inode+0x77/0x1b0 [xfs]
> [ 637.508881] [<ffffffff8127c22f>] evict+0xaf/0x180
> [ 637.514223] [<ffffffff8127cc6f>] iput+0x1af/0x290
> [ 637.519566] [<ffffffff812763fc>] __dentry_kill+0x17c/0x1e0
> [ 637.525782] [<ffffffff812776ad>] dput+0x25d/0x310
> [ 637.531126] [<ffffffff81277470>] ? dput+0x20/0x310
> [ 637.536566] [<ffffffff8125f904>] __fput+0x1a4/0x240
> [ 637.542102] [<ffffffff8125f9de>] ____fput+0xe/0x10
> [ 637.547542] [<ffffffff810b91a7>] task_work_run+0x77/0xa0
> [ 637.553565] [<ffffffff81098fdf>] do_exit+0x33f/0xc60
> [ 637.559199] [<ffffffff8109998c>] do_group_exit+0x4c/0xc0
> [ 637.565221] [<ffffffff810a7a11>] get_signal+0x331/0x8f0
> [ 637.571147] [<ffffffff8101d3c7>] do_signal+0x37/0x680
> [ 637.576878] [<ffffffff81113ab3>] ? rcu_read_lock_sched_held+0x93/0xa0
> [ 637.584160] [<ffffffff8123303e>] ? kfree+0x1ae/0x270
> [ 637.589794] [<ffffffff8108f2e4>] ? exit_to_usermode_loop+0x33/0xac
> [ 637.596785] [<ffffffff8108f30f>] exit_to_usermode_loop+0x5e/0xac
> [ 637.603584] [<ffffffff81003d0b>] syscall_return_slowpath+0xbb/0x130
> [ 637.610673] [<ffffffff81761ada>] int_ret_from_sys_call+0x25/0x9f
> [ 637.617469] Code: e8 65 3d 59 00 4c 89 ef e8 1d 72 07 00 4d 85 f6 74 06 48 89 df 41 ff d6 48 89 df e8 1a fb 00 00 5b 41 5c 41 5d 41 5e 41 5f 5d c3 <0f> 0b 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55
> [ 637.639172] RIP [<ffffffff811cd141>] delete_from_page_cache+0x81/0x90
> [ 637.646464] RSP <ffff8804379f7978>


So, according to bisect first bad commit is:

commit 68985633bccb6066bf1803e316fbc6c1f5b796d6
Author: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
Date: Tue Dec 1 14:04:04 2015 +0100

sched/wait: Fix signal handling in bit wait helpers

Vladimir reported getting RCU stall warnings and bisected it back to
commit:

743162013d40 ("sched: Remove proliferation of wait_on_bit() action functions")

That commit inadvertently reversed the calls to schedule() and signal_pending(),
thereby not handling the case where the signal receives while we sleep.

Reported-by: Vladimir Murzin <vladimir.murzin@xxxxxxx>
Tested-by: Vladimir Murzin <vladimir.murzin@xxxxxxx>
Signed-off-by: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx>
Cc: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
Cc: Mike Galbraith <efault@xxxxxx>
Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
Cc: mark.rutland@xxxxxxx
Cc: neilb@xxxxxxx
Cc: oleg@xxxxxxxxxx
Fixes: 743162013d40 ("sched: Remove proliferation of wait_on_bit() action functions")
Fixes: cbbce8220949 ("SCHED: add some "wait..on_bit...timeout()" interfaces.")
Link: http://lkml.kernel.org/r/20151201130404.GL3816@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Signed-off-by: Ingo Molnar <mingo@xxxxxxxxxx>

which seems to me is only exposing problem elsewhere.

>
> Regards,
> Jan
>
> [1] https://github.com/linux-test-project/ltp
>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/