[PATCH v4 0/1] nvmet-fcloop: unblock module removal

From: Daniel Wagner
Date: Tue Jun 06 2023 - 08:24:33 EST


With the recent FC related changes in blktests we only need one patch to unlock
all tests. That means blktests is able to run through all tests but not all will
passs. These fail look like real bugs in the FC code except nvme/040. I do see
the unload loop module failure also with tcp, rdma and loop.

We have an issue open on github to track the 'module unload before fc resource
cleanup' bug:

https://github.com/osandov/blktests/issues/120

Anyway, let's address one thing at the time:

The nvme/048 will block forever on the current nvme-6.4 branch:

run blktests nvme/048 at 2023-06-06 13:04:49
nvmet: adding nsid 1 to subsystem blktests-subsystem-1
nvme nvme2: NVME-FC{0}: create association : host wwpn 0x20001100aa000002 rport wwpn 0x20001100aa000001: NQN "blktests-subsystem-1"
(NULL device *): {0:0} Association created
[478] nvmet: ctrl 1 start keep-alive timer for 1 secs
[478] nvmet: check nqn.2014-08.org.nvmexpress:uuid:3d0c3f5d-cb37-4bc4-af29-2168953bfc0a
[478] nvmet: nvmet_setup_dhgroup: ctrl 1 selecting dhgroup 0
[478] nvmet: No authentication provided
nvmet: creating nvm controller 1 for subsystem blktests-subsystem-1 for NQN nqn.2014-08.org.nvmexpress:uuid:3d0c3f5d-cb37-4bc4-af29-2168953bfc0a.
[478] nvmet: adding queue 1 to ctrl 1.
[407] nvmet: adding queue 2 to ctrl 1.
[6549] nvmet: adding queue 3 to ctrl 1.
[1760] nvmet: adding queue 4 to ctrl 1.
nvme nvme2: NVME-FC{0}: controller connect complete
nvme nvme2: NVME-FC{0}: new ctrl: NQN "blktests-subsystem-1"
[478] nvmet: ctrl 1 reschedule traffic based keep-alive timer
[407] nvmet: ctrl 1 update keep-alive timer for 1 secs
nvme nvme2: NVME-FC{0}: io failed due to lldd error 6
nvme nvme2: NVME-FC{0}: transport association event: transport detected io error
nvme nvme2: NVME-FC{0}: resetting controller
[478] nvmet: ctrl 1 stop keep-alive
(NULL device *): {0:0} Association deleted
nvme nvme2: NVME-FC{0}: create association : host wwpn 0x20001100aa000002 rport wwpn 0x20001100aa000001: NQN "blktests-subsystem-1"
(NULL device *): {0:0} Association freed
(NULL device *): {0:0} Association created
(NULL device *): Disconnect LS failed: No Association
[407] nvmet: ctrl 1 start keep-alive timer for 1 secs
[407] nvmet: check nqn.2014-08.org.nvmexpress:uuid:3d0c3f5d-cb37-4bc4-af29-2168953bfc0a
[407] nvmet: nvmet_setup_dhgroup: ctrl 1 selecting dhgroup 0
[407] nvmet: No authentication provided
nvmet: creating nvm controller 1 for subsystem blktests-subsystem-1 for NQN nqn.2014-08.org.nvmexpress:uuid:3d0c3f5d-cb37-4bc4-af29-2168953bfc0a.
nvme nvme2: reconnect: revising io queue count from 4 to 1
[478] nvmet: adding queue 1 to ctrl 1.
nvme nvme2: NVME-FC{0}: controller connect complete
[478] nvmet: ctrl 1 reschedule traffic based keep-alive timer
[478] nvmet: ctrl 1 update keep-alive timer for 1 secs
[478] nvmet: ctrl 1 update keep-alive timer for 1 secs
[6549] nvmet: ctrl 1 update keep-alive timer for 1 secs
[6549] nvmet: ctrl 1 update keep-alive timer for 1 secs
[6549] nvmet: ctrl 1 update keep-alive timer for 1 secs
[6549] nvmet: ctrl 1 update keep-alive timer for 1 secs
[1760] nvmet: ctrl 1 update keep-alive timer for 1 secs
[1760] nvmet: ctrl 1 update keep-alive timer for 1 secs
[1760] nvmet: ctrl 1 update keep-alive timer for 1 secs
[1760] nvmet: ctrl 1 update keep-alive timer for 1 secs
[1760] nvmet: ctrl 1 update keep-alive timer for 1 secs
[1760] nvmet: ctrl 1 update keep-alive timer for 1 secs
[407] nvmet: ctrl 1 update keep-alive timer for 1 secs
nvme nvme2: NVME-FC{0}: io failed due to lldd error 6
nvme nvme2: NVME-FC{0}: transport association event: transport detected io error
nvme nvme2: NVME-FC{0}: resetting controller
[478] nvmet: ctrl 1 stop keep-alive
(NULL device *): {0:0} Association deleted
(NULL device *): {0:0} Association freed
(NULL device *): Disconnect LS failed: No Association
nvme nvme2: NVME-FC{0}: create association : host wwpn 0x20001100aa000002 rport wwpn 0x20001100aa000001: NQN "blktests-subsystem-1"
(NULL device *): {0:0} Association created
[1760] nvmet: ctrl 1 start keep-alive timer for 1 secs
[1760] nvmet: check nqn.2014-08.org.nvmexpress:uuid:3d0c3f5d-cb37-4bc4-af29-2168953bfc0a
[1760] nvmet: nvmet_setup_dhgroup: ctrl 1 selecting dhgroup 0
[1760] nvmet: No authentication provided
nvmet: creating nvm controller 1 for subsystem blktests-subsystem-1 for NQN nqn.2014-08.org.nvmexpress:uuid:3d0c3f5d-cb37-4bc4-af29-2168953bfc0a.
nvme nvme2: reconnect: revising io queue count from 1 to 4
[478] nvmet: adding queue 1 to ctrl 1.
[407] nvmet: adding queue 2 to ctrl 1.
[6549] nvmet: adding queue 3 to ctrl 1.
[1760] nvmet: adding queue 4 to ctrl 1.
nvme nvme2: NVME-FC{0}: controller connect complete
[1760] nvmet: ctrl 1 reschedule traffic based keep-alive timer
[478] nvmet: ctrl 1 update keep-alive timer for 1 secs
[478] nvmet: ctrl 1 update keep-alive timer for 1 secs
[478] nvmet: ctrl 1 update keep-alive timer for 1 secs
[478] nvmet: ctrl 1 update keep-alive timer for 1 secs
[1760] nvmet: ctrl 1 update keep-alive timer for 1 secs
[6549] nvmet: ctrl 1 update keep-alive timer for 1 secs
[1760] nvmet: ctrl 1 update keep-alive timer for 1 secs
[6549] nvmet: ctrl 1 update keep-alive timer for 1 secs
[1760] nvmet: ctrl 1 update keep-alive timer for 1 secs
[1760] nvmet: ctrl 1 update keep-alive timer for 1 secs
[6549] nvmet: ctrl 1 update keep-alive timer for 1 secs
[1760] nvmet: ctrl 1 update keep-alive timer for 1 secs
[6549] nvmet: ctrl 1 update keep-alive timer for 1 secs
nvme nvme2: Removing ctrl: NQN "blktests-subsystem-1"
[407] nvmet: ctrl 1 stop keep-alive
(NULL device *): {0:0} Association deleted
(NULL device *): {0:0} Association freed
(NULL device *): Disconnect LS failed: No Association
nvme_fc: nvme_fc_exit_module: waiting for ctlr deletes
BTRFS info (device vda2): qgroup scan completed (inconsistency flag cleared)
INFO: task modprobe:11758 blocked for more than 491 seconds.
Tainted: G W 6.4.0-rc2+ #3
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:modprobe state:D stack:0 pid:11758 ppid:11585 flags:0x00004002
Call Trace:
<TASK>
__schedule+0x17b5/0x4870
? finish_lock_switch+0x8f/0x120
? finish_task_switch+0x17f/0x5c0
? __sched_text_start+0x10/0x10
? __schedule+0x17bd/0x4870
? do_raw_spin_trylock+0xc9/0x1f0
schedule+0xe6/0x1e0
schedule_timeout+0x7e/0x1e0
? __cfi_schedule_timeout+0x10/0x10
? do_raw_spin_trylock+0xc9/0x1f0
? __cfi_lock_release+0x10/0x10
? do_raw_spin_unlock+0x116/0x8a0
wait_for_common+0x377/0x600
? wait_for_completion+0x30/0x30
cleanup_module+0x222/0x240 [nvme_fc bda1ef6f83d00208ba257c5d0ac34dd6bdb58bf9]
__se_sys_delete_module+0x388/0x550
? __x64_sys_delete_module+0x50/0x50
? task_work_run+0x236/0x290
? syscall_enter_from_user_mode+0x2e/0x210
do_syscall_64+0x6e/0xa0
? syscall_exit_to_user_mode+0x5e/0x220
? do_syscall_64+0x7d/0xa0
? syscall_exit_to_user_mode+0x5e/0x220
? do_syscall_64+0x7d/0xa0
? syscall_exit_to_user_mode+0x5e/0x220
? do_syscall_64+0x7d/0xa0
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7fa80811aebb
RSP: 002b:00007fff80ea0a88 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
RAX: ffffffffffffffda RBX: 000055a0b5acf1f0 RCX: 00007fa80811aebb
RDX: 0000000000000000 RSI: 0000000000000800 RDI: 000055a0b5acf258
RBP: 0000000000000000 R08: 1999999999999999 R09: 0000000000000000
R10: 00007fa808193ac0 R11: 0000000000000206 R12: 00007fff80ea0ad0
R13: 000055a0b5acf1f0 R14: 0000000000000000 R15: 0000000000000000
</TASK>
INFO: lockdep is turned off.


with the patch nvme/048 test will at least terminate:


# nvme_trtype=fc nvme_img_size=16M nvme_num_iter=10 ./check nvme
nvme/002 (create many subsystems and test discovery) [not run]
nvme_trtype=fc is not supported in this test
nvme/003 (test if we're sending keep-alives to a discovery controller) [passed]
runtime 10.337s ... 10.413s
nvme/004 (test nvme and nvmet UUID NS descriptors) [passed]
runtime 1.817s ... 1.988s
nvme/005 (reset local loopback target) [passed]
runtime 1.951s ... 2.128s
nvme/006 (create an NVMeOF target with a block device-backed ns) [passed]
runtime 0.176s ... 0.271s
nvme/007 (create an NVMeOF target with a file-backed ns) [passed]
runtime 0.124s ... 0.191s
nvme/008 (create an NVMeOF host with a block device-backed ns) [passed]
runtime 1.899s ... 2.062s
nvme/009 (create an NVMeOF host with a file-backed ns) [passed]
runtime 1.815s ... 2.004s
nvme/010 (run data verification fio job on NVMeOF block device-backed ns) [passed]
runtime 2.795s ... 3.367s
nvme/011 (run data verification fio job on NVMeOF file-backed ns) [passed]
runtime 3.196s ... 3.934s
nvme/012 (run mkfs and data verification fio job on NVMeOF block device-backed ns) [not run]
nvme_img_size must be at least 350m
nvme/013 (run mkfs and data verification fio job on NVMeOF file-backed ns) [not run]
nvme_img_size must be at least 350m
nvme/014 (flush a NVMeOF block device-backed ns) [passed]
runtime 2.012s ... 2.270s
nvme/015 (unit test for NVMe flush for file backed ns) [passed]
runtime 1.953s ... 2.237s
nvme/016 (create/delete many NVMeOF block device-backed ns and test discovery) [not run]
nvme_trtype=fc is not supported in this test
nvme/017 (create/delete many file-ns and test discovery) [not run]
nvme_trtype=fc is not supported in this test
nvme/018 (unit test NVMe-oF out of range access on a file backend) [passed]
runtime 1.862s ... 1.996s
nvme/019 (test NVMe DSM Discard command on NVMeOF block-device ns) [passed]
runtime 1.807s ... 2.075s
nvme/020 (test NVMe DSM Discard command on NVMeOF file-backed ns) [passed]
runtime 1.739s ... 1.926s
nvme/021 (test NVMe list command on NVMeOF file-backed ns) [passed]
runtime 1.807s ... 2.041s
nvme/022 (test NVMe reset command on NVMeOF file-backed ns) [passed]
runtime 1.944s ... 2.066s
nvme/023 (test NVMe smart-log command on NVMeOF block-device ns) [passed]
runtime 1.772s ... 1.901s
nvme/024 (test NVMe smart-log command on NVMeOF file-backed ns) [passed]
runtime 1.800s ... 1.997s
nvme/025 (test NVMe effects-log command on NVMeOF file-backed ns) [passed]
runtime 1.806s ... 1.955s
nvme/026 (test NVMe ns-descs command on NVMeOF file-backed ns) [passed]
runtime 1.837s ... 1.945s
nvme/027 (test NVMe ns-rescan command on NVMeOF file-backed ns) [passed]
runtime 1.762s ... 2.006s
nvme/028 (test NVMe list-subsys command on NVMeOF file-backed ns) [passed]
runtime 1.801s ... 2.102s
nvme/029 (test userspace IO via nvme-cli read/write interface) [passed]
runtime 2.175s ... 2.597s
nvme/030 (ensure the discovery generation counter is updated appropriately) [passed]
runtime 0.616s ... 0.805s
nvme/031 (test deletion of NVMeOF controllers immediately after setup) [passed]
runtime 3.415s ... 4.329s
nvme/038 (test deletion of NVMeOF subsystem without enabling) [passed]
runtime 0.064s ... 0.114s
nvme/040 (test nvme fabrics controller reset/disconnect operation during I/O)
nvme/040 (test nvme fabrics controller reset/disconnect operation during I/O) [passed]
runtime 7.854s ... 8.058srk/blktests/results/tmpdir.nvme.040.qkb': Directory not empty
modprobe: FATAL: Module loop is in use.
modprobe: FATAL: Module loop is in use.
modprobe: FATAL: Module loop is in use.
modprobe: FATAL: Module loop is in use.
modprobe: FATAL: Module loop is in use.
modprobe: FATAL: Module loop is in use.
modprobe: FATAL: Module loop is in use.
modprobe: FATAL: Module loop is in use.
modprobe: FATAL: Module loop is in use.
modprobe: FATAL: Module loop is in use.
nvme/041 (Create authenticated connections)
runtime 0.302s ...
WARNING: Test did not clean up fc device: nvme2
nvme/041 (Create authenticated connections) [failed]
runtime 0.302s ... 0.435s
--- tests/nvme/041.out 2023-04-18 17:43:18.167745976 +0200
+++ /home/wagi/work/blktests/results/nodev/nvme/041.out.bad 2023-06-06 14:07:48.836546072 +0200
@@ -1,6 +1,8 @@
Running nvme/041
Test unauthenticated connection (should fail)
+failed to lookup subsystem for controller nvme2
NQN:blktests-subsystem-1 disconnected 0 controller(s)
Test authenticated connection
-NQN:blktests-subsystem-1 disconnected 1 controller(s)
+failed to lookup subsystem for controller nvme2
...
(Run 'diff -u tests/nvme/041.out /home/wagi/work/blktests/results/nodev/nvme/041.out.bad' to see the entire diff)
nvme/042 (Test dhchap key types for authenticated connections) [passed]
runtime 4.832s ... 4.489s
nvme/043 (Test hash and DH group variations for authenticated connections) [passed]
runtime 2.698s ... 9.475s
nvme/044 (Test bi-directional authentication)
runtime 1.205s ...
WARNING: Test did not clean up fc device: nvme2
nvme/044 (Test bi-directional authentication) [failed]
runtime 1.205s ... 1.048s
--- tests/nvme/044.out 2023-04-18 17:43:18.167745976 +0200
+++ /home/wagi/work/blktests/results/nodev/nvme/044.out.bad 2023-06-06 14:08:08.524660313 +0200
@@ -2,9 +2,12 @@
Test host authentication
NQN:blktests-subsystem-1 disconnected 1 controller(s)
Test invalid ctrl authentication (should fail)
+failed to lookup subsystem for controller nvme2
NQN:blktests-subsystem-1 disconnected 0 controller(s)
Test valid ctrl authentication
-NQN:blktests-subsystem-1 disconnected 1 controller(s)
...
(Run 'diff -u tests/nvme/044.out /home/wagi/work/blktests/results/nodev/nvme/044.out.bad' to see the entire diff)
nvme/045 (Test re-authentication) [passed]
runtime 6.133s ... 4.742s
nvme/047 (test different queue types for fabric transports) [not run]
nvme_trtype=fc is not supported in this test
nvme/048 (Test queue count changes on reconnect) [failed]
runtime 16.300s ... 16.278s
--- tests/nvme/048.out 2023-06-05 17:20:59.370208331 +0200
+++ /home/wagi/work/blktests/results/nodev/nvme/048.out.bad 2023-06-06 14:08:33.720806513 +0200
@@ -1,3 +1,7 @@
Running nvme/048
+expected state "connecting" not reached within 5 seconds
+FAIL
+expected state "connecting" not reached within 5 seconds
+FAIL
NQN:blktests-subsystem-1 disconnected 1 controller(s)
Test complete


Daniel Wagner (1):
nvmet-fcloop: Do not wait on completion when unregister fails

changes:
v4:
- drop module unload before fc resource cleanup patches

v3:
- do not unlink rport twice
- https://lore.kernel.org/linux-nvme/20230418130159.11075-1-dwagner@xxxxxxx/

v2:
- added additional fixes
- https://lore.kernel.org/linux-nvme/20230411120718.14477-1-dwagner@xxxxxxx/

v1:
- initial version
- https://lore.kernel.org/linux-nvme/20230411092209.12719-1-dwagner@xxxxxxx/

drivers/nvme/target/fcloop.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

--
2.40.1