[4.9-rc1+] intel_uncore builtin + CONFIG_DEBUG_TEST_DRIVER_REMOVE kernel panic

From: CAI Qian
Date: Wed Oct 19 2016 - 10:46:06 EST


It turns out this can only be reproducible when compiled intel_uncore as a builtin, i.e.,
not compiled it as a module. The can still be reproduced in the yesterday's mainline.

Here is some information about the system,

Intel Platform: Grantley-R Wildcat Pass CPU: Broadwell-EP, B0.
Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz

[ Â 66.349263] PCI-DMA: Using software bounce buffering for IO (SWIOTLB)
[ Â 66.356672] software IO TLB [mem 0x71c7d000-0x75c7d000] (64MB) mapped at [ffff880071c7d000-ffff880075c7cfff]
[ Â 66.369911] Intel CQM monitoring enabled
[ Â 66.374445] Intel MBM enabled
[ Â 66.385708] RAPL PMU: API unit is 2^-32 Joules, 4 fixed counters, 655360 ms ovfl timer
[ Â 66.394564] RAPL PMU: hw unit of domain pp0-core 2^-14 Joules
[ Â 66.400991] RAPL PMU: hw unit of domain package 2^-14 Joules
[ Â 66.407317] RAPL PMU: hw unit of domain dram 2^-14 Joules
[ Â 66.413358] RAPL PMU: hw unit of domain pp1-gpu 2^-14 Joules
[ Â 66.434040] ================================================================================
[ Â 66.443462] UBSAN: Undefined behaviour in drivers/base/core.c:1251:17
[ Â 66.450653] member access within null pointer of type 'struct device'
[ Â 66.457845] CPU: 68 PID: 1 Comm: swapper/0 Not tainted 4.9.0-rc1-lockfix+ #48
[ Â 66.465809] Hardware name: Intel Corporation S2600WTT/S2600WTT, BIOS GRRFSDP1.86B.0271.R00.1510301446 10/30/2015
[ Â 66.477168] Âffff880847aff798 ffffffff81d370b4 0000000041b58ab3 ffffffff83348dcf
[ Â 66.485469] Âffffffff81d36ff4 ffff880847aff7c0 ffff880847aff770 ffff880e3f9d8000
[ Â 66.493770] Âffffffff82ff8a00 ffffffff8309c5c0 00000000000004e3 000000009091f309
[ Â 66.502073] Call Trace:
[ Â 66.504811] Â[<ffffffff81d370b4>] dump_stack+0xc0/0x12c
[ Â 66.510644] Â[<ffffffff81d36ff4>] ? _atomic_dec_and_lock+0xc4/0xc4
[ Â 66.517548] Â[<ffffffff81e5ac85>] ubsan_epilogue+0xd/0x8a
[ Â 66.523574] Â[<ffffffff81e5ae68>] __ubsan_handle_type_mismatch+0x166/0x434
[ Â 66.531253] Â[<ffffffff813294dd>] ? get_lock_stats+0x1d/0x120
[ Â 66.537667] Â[<ffffffff81e5ad02>] ? ubsan_epilogue+0x8a/0x8a
[ Â 66.543985] Â[<ffffffff82241acc>] device_del+0x6fc/0x860
[ Â 66.549917] Â[<ffffffff82c8a5d2>] ? _raw_spin_unlock_irqrestore+0x42/0x70
[ Â 66.557494] Â[<ffffffff822413d0>] ? cleanup_glue_dir+0x140/0x140
[ Â 66.564202] Â[<ffffffff8160a6f2>] perf_pmu_unregister+0x142/0x6d0
[ Â 66.571006] Â[<ffffffff81278cae>] ? preempt_count_sub+0x5e/0xe0
[ Â 66.577619] Â[<ffffffff810559f7>] uncore_pmu_unregister+0x67/0xd0
[ Â 66.584422] Â[<ffffffff8105ae6c>] uncore_pci_remove+0x32c/0x510
[ Â 66.591025] Â[<ffffffff81ec8392>] pci_device_remove+0xb2/0x240
[ Â 66.597539] Â[<ffffffff8224fe76>] driver_probe_device+0x146/0xfc0
[ Â 66.604340] Â[<ffffffff82250cf0>] ? driver_probe_device+0xfc0/0xfc0
[ Â 66.611334] Â[<ffffffff82250ea5>] __driver_attach+0x1b5/0x230
[ Â 66.617749] Â[<ffffffff82248e60>] bus_for_each_dev+0x130/0x200
[ Â 66.624264] Â[<ffffffff81353300>] ? do_raw_spin_trylock+0x110/0x110
[ Â 66.631258] Â[<ffffffff82248d30>] ? subsys_dev_iter_init+0x100/0x100
[ Â 66.638349] Â[<ffffffff81278cae>] ? preempt_count_sub+0x5e/0xe0
[ Â 66.644959] Â[<ffffffff8224eaa2>] driver_attach+0x42/0x70
[ Â 66.650976] Â[<ffffffff8224d846>] bus_add_driver+0x406/0x870
[ Â 66.657292] Â[<ffffffff822535b9>] driver_register+0x1a9/0x3d0
[ Â 66.663704] Â[<ffffffff81352942>] ? __raw_spin_lock_init+0x32/0x120
[ Â 66.670700] Â[<ffffffff81ec2a1d>] __pci_register_driver+0x1ad/0x2b0
[ Â 66.677694] Â[<ffffffff81ec2870>] ? pci_pm_runtime_idle+0x180/0x180
[ Â 66.684694] Â[<ffffffff858f57b5>] intel_uncore_init+0x58d/0x64c
[ Â 66.691300] Â[<ffffffff858ed56d>] ? amd_iommu_pc_init+0x16/0x344
[ Â 66.698006] Â[<ffffffff858f5228>] ? uncore_type_init+0x5cb/0x5cb
[ Â 66.704710] Â[<ffffffff81000587>] do_one_initcall+0xb7/0x2a0
[ Â 66.711025] Â[<ffffffff810004d0>] ? initcall_blacklisted+0x1a0/0x1a0
[ Â 66.718116] Â[<ffffffff8132687d>] ? up_write+0x7d/0x120
[ Â 66.723949] Â[<ffffffff81326800>] ? up_read+0x40/0x40
[ Â 66.729587] Â[<ffffffff82c8a5d2>] ? _raw_spin_unlock_irqrestore+0x42/0x70
[ Â 66.737165] Â[<ffffffff8130db04>] ? __wake_up+0x44/0x50
[ Â 66.743000] Â[<ffffffff858e71b9>] kernel_init_freeable+0x68a/0x768
[ Â 66.749900] Â[<ffffffff858e6b2f>] ? start_kernel+0x751/0x751
[ Â 66.756219] Â[<ffffffff81075ec0>] ? compat_start_thread+0xa0/0xa0
[ Â 66.763013] Â[<ffffffff82c704c0>] ? rest_init+0x190/0x190
[ Â 66.769039] Â[<ffffffff82c704d3>] kernel_init+0x13/0x140
[ Â 66.774967] Â[<ffffffff82c704c0>] ? rest_init+0x190/0x190
[ Â 66.780993] Â[<ffffffff82c8b0d7>] ret_from_fork+0x27/0x40
[ Â 66.787019] ================================================================================
[ Â 66.796479] kasan: CONFIG_KASAN_INLINE enabled
[ Â 66.801450] kasan: GPF could be caused by NULL-ptr deref or user memory access
[ Â 66.809525] general protection fault: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC KASAN
[ Â 66.817878] Modules linked in:
[ Â 66.821295] CPU: 68 PID: 1 Comm: swapper/0 Not tainted 4.9.0-rc1-lockfix+ #48
[ Â 66.829260] Hardware name: Intel Corporation S2600WTT/S2600WTT, BIOS GRRFSDP1.86B.0271.R00.1510301446 10/30/2015
[ Â 66.840618] task: ffff880e3f9d8000 task.stack: ffff880847af8000
[ Â 66.847225] RIP: 0010:[<ffffffff82241466>] Â[<ffffffff82241466>] device_del+0x96/0x860
[ Â 66.856076] RSP: 0000:ffff880847aff868 ÂEFLAGS: 00010246
[ Â 66.862002] RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000000000000
[ Â 66.869967] RDX: 0000000000000000 RSI: ffffffff82ea0cc0 RDI: ffffed0108f5ff06
[ Â 66.877931] RBP: ffff880847aff920 R08: ffff880e3f9d8000 R09: 0000000000000007
[ Â 66.885894] R10: 0000000000000000 R11: 0000000000000006 R12: ffff880844094930
[ Â 66.893859] R13: 0000000000000001 R14: ffff880844094800 R15: ffff880844095258
[ Â 66.901824] FS: Â0000000000000000(0000) GS:ffff880e54e00000(0000) knlGS:0000000000000000
[ Â 66.910853] CS: Â0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ Â 66.917265] CR2: 0000000000000000 CR3: 000000000360a000 CR4: 00000000003406e0
[ Â 66.925228] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ Â 66.933191] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ Â 66.941154] Stack:
[ Â 66.943396] Âffffffff82c8a5d2 ffff881077f705c0 1ffff10108f5ff13 ffff880847aff920
[ Â 66.951698] Â0000000000000000 ffffffff86d346c8 0000000041b58ab3 ffffffff8338e870
[ Â 66.959997] Âffffffff822413d0 ffff880e00000044 ffffffff00000000 ffff880847aff8c0
[ Â 66.968296] Call Trace:
[ Â 66.971025] Â[<ffffffff82c8a5d2>] ? _raw_spin_unlock_irqrestore+0x42/0x70
[ Â 66.978603] Â[<ffffffff822413d0>] ? cleanup_glue_dir+0x140/0x140
[ Â 66.985309] Â[<ffffffff8160a6f2>] perf_pmu_unregister+0x142/0x6d0
[ Â 66.992111] Â[<ffffffff81278cae>] ? preempt_count_sub+0x5e/0xe0
[ Â 66.998720] Â[<ffffffff810559f7>] uncore_pmu_unregister+0x67/0xd0
[ Â 67.005523] Â[<ffffffff8105ae6c>] uncore_pci_remove+0x32c/0x510
[ Â 67.012131] Â[<ffffffff81ec8392>] pci_device_remove+0xb2/0x240
[ Â 67.018641] Â[<ffffffff8224fe76>] driver_probe_device+0x146/0xfc0
[ Â 67.025442] Â[<ffffffff82250cf0>] ? driver_probe_device+0xfc0/0xfc0
[ Â 67.032437] Â[<ffffffff82250ea5>] __driver_attach+0x1b5/0x230
[ Â 67.038852] Â[<ffffffff82248e60>] bus_for_each_dev+0x130/0x200
[ Â 67.045361] Â[<ffffffff81353300>] ? do_raw_spin_trylock+0x110/0x110
[ Â 67.052357] Â[<ffffffff82248d30>] ? subsys_dev_iter_init+0x100/0x100
[ Â 67.059450] Â[<ffffffff81278cae>] ? preempt_count_sub+0x5e/0xe0
[ Â 67.066056] Â[<ffffffff8224eaa2>] driver_attach+0x42/0x70
[ Â 67.072081] Â[<ffffffff8224d846>] bus_add_driver+0x406/0x870
[ Â 67.078397] Â[<ffffffff822535b9>] driver_register+0x1a9/0x3d0
[ Â 67.084809] Â[<ffffffff81352942>] ? __raw_spin_lock_init+0x32/0x120
[ Â 67.091803] Â[<ffffffff81ec2a1d>] __pci_register_driver+0x1ad/0x2b0
[ Â 67.098798] Â[<ffffffff81ec2870>] ? pci_pm_runtime_idle+0x180/0x180
[ Â 67.105792] Â[<ffffffff858f57b5>] intel_uncore_init+0x58d/0x64c
[ Â 67.112399] Â[<ffffffff858ed56d>] ? amd_iommu_pc_init+0x16/0x344
[ Â 67.119103] Â[<ffffffff858f5228>] ? uncore_type_init+0x5cb/0x5cb
[ Â 67.125806] Â[<ffffffff81000587>] do_one_initcall+0xb7/0x2a0
[ Â 67.132124] Â[<ffffffff810004d0>] ? initcall_blacklisted+0x1a0/0x1a0
[ Â 67.139215] Â[<ffffffff8132687d>] ? up_write+0x7d/0x120
[ Â 67.145046] Â[<ffffffff81326800>] ? up_read+0x40/0x40
[ Â 67.150684] Â[<ffffffff82c8a5d2>] ? _raw_spin_unlock_irqrestore+0x42/0x70
[ Â 67.158262] Â[<ffffffff8130db04>] ? __wake_up+0x44/0x50
[ Â 67.164094] Â[<ffffffff858e71b9>] kernel_init_freeable+0x68a/0x768
[ Â 67.170992] Â[<ffffffff858e6b2f>] ? start_kernel+0x751/0x751
[ Â 67.177310] Â[<ffffffff81075ec0>] ? compat_start_thread+0xa0/0xa0
[ Â 67.184111] Â[<ffffffff82c704c0>] ? rest_init+0x190/0x190
[ Â 67.190137] Â[<ffffffff82c704d3>] kernel_init+0x13/0x140
[ Â 67.196064] Â[<ffffffff82c704c0>] ? rest_init+0x190/0x190
[ Â 67.202090] Â[<ffffffff82c8b0d7>] ret_from_fork+0x27/0x40
[ Â 67.208115] Code: f3 f3 65 48 8b 04 25 28 00 00 00 48 89 45 d0 31 c0 48 85 ff 0f 84 69 06 00 00 48 89 da 48 b8 00 00 00 00 00 fc ff df 48 c1 ea 03 <80> 3c 02 00 0f 85 41 06 00 00 48 8b 03 48 89 85 68 ff ff ff 48
[ Â 67.229872] RIP Â[<ffffffff82241466>] device_del+0x96/0x860
[ Â 67.236101] ÂRSP <ffff880847aff868>
[ Â 67.240059] ---[ end trace 69358e866a1e3f6c ]---
[ Â 67.245377] Kernel panic - not syncing: Fatal exception
[ Â 67.251271] ---[ end Kernel panic - not syncing: Fatal exception


----- Original Message -----
> From: "Rob Herring" <robh@xxxxxxxxxx>
> To: "Greg Kroah-Hartman" <gregkh@xxxxxxxxxxxxxxxxxxx>
> Cc: "CAI Qian" <caiqian@xxxxxxxxxx>, "linux-kernel" <linux-kernel@xxxxxxxxxxxxxxx>
> Sent: Monday, October 10, 2016 2:15:29 PM
> Subject: Re: kasan inline + CONFIG_DEBUG_TEST_DRIVER_REMOVE kernel panic
>
> On Mon, Oct 10, 2016 at 12:20 PM, Greg Kroah-Hartman
> <gregkh@xxxxxxxxxxxxxxxxxxx> wrote:
> > On Mon, Oct 10, 2016 at 11:37:27AM -0400, CAI Qian wrote:
> >> Not sure if anyone reported this before. With this kernel config, it is
> >> 100% kernel panic so far with today's
> >> mainline master HEAD.
> >>
> >> http://people.redhat.com/qcai/tmp/config-kasan-remove
> >
> > Oh it breaks things with kasan disabled as well :)
> >
> > See Laszlo's bug report already a few hours ago, Rob is on it...
>
> I think this one is different though. It has a remove() hook.
>
> Rob
>