[bug report from blktests nvme/032] WARNING: possible circular locking dependency detected

From: yangx.jy@xxxxxxxxxxx
Date: Tue Jun 28 2022 - 00:05:17 EST


Hi everyone,

Running blktests nvme/032 on kernel v5.19-rc2+ which enables
CONFIG_LOCKDEP and CONFIG_PROVE_LOCKING triggered the below WARNING.

It seems that the deadlock comes from fs/kernfs and driver/pci.
I am not familiar with the related code so I hope someone can give some
advice.

[ 8245.501691] run blktests nvme/032 at 2022-06-28 11:02:59

[ 8253.061372] ======================================================
[ 8253.061372] WARNING: possible circular locking dependency detected
[ 8253.061372] 5.19.0-rc2+ #33 Not tainted
[ 8253.061372] ------------------------------------------------------
[ 8253.061372] check/1875 is trying to acquire lock:
[ 8253.061372] ffff91a2420277e8 (kn->active#282){++++}-{0:0}, at:
kernfs_remove_by_name_ns+0x53/0x90
[ 8253.061372]
but task is already holding lock:
[ 8253.061372] ffffffffb5a07ae8 (pci_rescan_remove_lock){+.+.}-{3:3},
at: pci_stop_and_remove_bus_device_locked+0xe/0x30
[ 8253.061372]
which lock already depends on the new lock.

[ 8253.061372]
the existing dependency chain (in reverse order) is:
[ 8253.061372]
-> #1 (pci_rescan_remove_lock){+.+.}-{3:3}:
[ 8253.061372] __mutex_lock+0x89/0xee0
[ 8253.061372] dev_rescan_store+0x58/0x80
[ 8253.061372] kernfs_fop_write_iter+0x140/0x1e0
[ 8253.061372] new_sync_write+0x10c/0x190
[ 8253.061372] vfs_write+0x268/0x370
[ 8253.061372] ksys_write+0x65/0xe0
[ 8253.061372] do_syscall_64+0x3b/0x90
[ 8253.061372] entry_SYSCALL_64_after_hwframe+0x46/0xb0
[ 8253.061372]
-> #0 (kn->active#282){++++}-{0:0}:
[ 8253.061372] __lock_acquire+0x131d/0x2810
[ 8253.061372] lock_acquire+0xd5/0x2f0
[ 8253.061372] __kernfs_remove+0x281/0x2e0
[ 8253.061372] kernfs_remove_by_name_ns+0x53/0x90
[ 8253.061372] remove_files.isra.0+0x30/0x70
[ 8253.061372] sysfs_remove_group+0x3d/0x80
[ 8253.061372] sysfs_remove_groups+0x29/0x40
[ 8253.061372] device_remove_attrs+0xb3/0xf0
[ 8253.061372] device_del+0x1a2/0x410
[ 8253.061372] pci_remove_bus_device+0x70/0x110
[ 8253.061372] pci_stop_and_remove_bus_device_locked+0x1e/0x30
[ 8253.061372] remove_store+0x75/0x90
[ 8253.061372] kernfs_fop_write_iter+0x140/0x1e0
[ 8253.061372] new_sync_write+0x10c/0x190
[ 8253.061372] vfs_write+0x268/0x370
[ 8253.061372] ksys_write+0x65/0xe0
[ 8253.061372] do_syscall_64+0x3b/0x90
[ 8253.061372] entry_SYSCALL_64_after_hwframe+0x46/0xb0
[ 8253.061372]
other info that might help us debug this:

[ 8253.061372] Possible unsafe locking scenario:

[ 8253.061372] CPU0 CPU1
[ 8253.061372] ---- ----
[ 8253.061372] lock(pci_rescan_remove_lock);
[ 8253.061372] lock(kn->active#282);
[ 8253.061372] lock(pci_rescan_remove_lock);
[ 8253.061372] lock(kn->active#282);
[ 8253.061372]
*** DEADLOCK ***

[ 8253.061372] 3 locks held by check/1875:
[ 8253.061372] #0: ffff91a275567458 (sb_writers#4){.+.+}-{0:0}, at:
ksys_write+0x65/0xe0
[ 8253.061372] #1: ffff91a246939488 (&of->mutex){+.+.}-{3:3}, at:
kernfs_fop_write_iter+0x10c/0x1e0
[ 8253.061372] #2: ffffffffb5a07ae8
(pci_rescan_remove_lock){+.+.}-{3:3}, at:
pci_stop_and_remove_bus_device_locked+0xe/0x30
[ 8253.061372]
stack backtrace:
[ 8253.061372] CPU: 1 PID: 1875 Comm: check Kdump: loaded Not tainted
5.19.0-rc2+ #33
[ 8253.061372] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
rel-1.15.0-29-g6a62e0cb0dfe-prebuilt.qemu.org 04/01/2014
[ 8253.074562] Call Trace:
[ 8253.074562] <TASK>
[ 8253.074562] dump_stack_lvl+0x56/0x6f
[ 8253.074562] check_noncircular+0xfe/0x110
[ 8253.074562] __lock_acquire+0x131d/0x2810
[ 8253.074562] lock_acquire+0xd5/0x2f0
[ 8253.074562] ? kernfs_remove_by_name_ns+0x53/0x90
[ 8253.074562] ? lock_release+0x144/0x2c0
[ 8253.074562] ? lock_is_held_type+0xe2/0x140
[ 8253.074562] __kernfs_remove+0x281/0x2e0
[ 8253.074562] ? kernfs_remove_by_name_ns+0x53/0x90
[ 8253.074562] ? kernfs_name_hash+0x12/0x80
[ 8253.074562] kernfs_remove_by_name_ns+0x53/0x90
[ 8253.074562] remove_files.isra.0+0x30/0x70
[ 8253.074562] sysfs_remove_group+0x3d/0x80
[ 8253.074562] sysfs_remove_groups+0x29/0x40
[ 8253.074562] device_remove_attrs+0xb3/0xf0
[ 8253.074562] ? kernfs_remove_by_name_ns+0x5b/0x90
[ 8253.074562] device_del+0x1a2/0x410
[ 8253.074562] pci_remove_bus_device+0x70/0x110
[ 8253.074562] pci_stop_and_remove_bus_device_locked+0x1e/0x30
[ 8253.074562] remove_store+0x75/0x90
[ 8253.074562] kernfs_fop_write_iter+0x140/0x1e0
[ 8253.074562] new_sync_write+0x10c/0x190
[ 8253.074562] vfs_write+0x268/0x370
[ 8253.074562] ksys_write+0x65/0xe0
[ 8253.074562] do_syscall_64+0x3b/0x90
[ 8253.074562] entry_SYSCALL_64_after_hwframe+0x46/0xb0
[ 8253.074562] RIP: 0033:0x7f0db99004a7
[ 8253.074562] Code: 64 89 02 48 c7 c0 ff ff ff ff eb bb 0f 1f 80 00 00
00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f
05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
[ 8253.074562] RSP: 002b:00007ffd07c32428 EFLAGS: 00000246 ORIG_RAX:
0000000000000001
[ 8253.074562] RAX: ffffffffffffffda RBX: 0000000000000002 RCX:
00007f0db99004a7
[ 8253.074562] RDX: 0000000000000002 RSI: 0000561722b20200 RDI:
0000000000000001
[ 8253.074562] RBP: 0000561722b20200 R08: 000000000000000a R09:
0000000000000001
[ 8253.074562] R10: 0000561722c050a0 R11: 0000000000000246 R12:
0000000000000002
[ 8253.074562] R13: 00007f0db99d1500 R14: 0000000000000002 R15:
00007f0db99d1700
[ 8253.074562] </TASK>
[ 8253.289448] pci 0000:00:03.0: [8086:5845] type 00 class 0x010802
[ 8253.299765] pci 0000:00:03.0: reg 0x10: [mem 0xfebd0000-0xfebd3fff 64bit]
[ 8253.418780] pci 0000:00:03.0: BAR 0: assigned [mem
0x640000000-0x640003fff 64bit]
[ 8253.481841] nvme nvme0: pci function 0000:00:03.0
[ 8253.660699] nvme nvme0: 4/0/0 default/read/poll queues
[ 8253.681913] nvme nvme0: Ignoring bogus Namespace Identifiers

Best Regards,
Xiao Yang