[RFC blktests v1 0/1] Test case for 'nvme: short-circuit connection retries'

From: Daniel Wagner
Date: Wed Jun 21 2023 - 11:58:46 EST


We had a longer discussion on how to interpret the DNR bit on reconnect attempts
in [1]. The conclusion was (if I got this right) is we should not try to reconnect
when the error response had the DNR bit set using the same parameters.

The FC transport already implemented this behavior with

f25f8ef70ce2 ("nvme-fc: short-circuit reconnect retries")

Hannes also provided patches for TCP and RDMA [2]. With these patches this test
will pass.

The nvme/050 implements this test case by (ab)using the queue count mechanism to
trigger a reconnect. Before the reconnect is triggered the tests set the
allowed_any_host attribute to 0 and forces the reconnect to fail.

[1] https://lore.kernel.org/linux-nvme/20220927143157.3659-1-dwagner@xxxxxxx/
[2] https://lore.kernel.org/linux-nvme/20220715063356.134124-1-hare@xxxxxxx/


This patch is based on top of
blktests: https://lore.kernel.org/linux-nvme/20230620132703.20648-1-dwagner@xxxxxxx/
linux: https://lore.kernel.org/linux-nvme/20230620133711.22840-1-dwagner@xxxxxxx/


fc:

nvme/050 (test DNR is handled on connect attempt with invalid arguments) [passed]
runtime 8.845s ... 3.756s

tcp:

nvme/050 (test DNR is handled on connect attempt with invalid arguments) [failed]
runtime 3.756s ... 8.836s
--- tests/nvme/050.out 2023-06-21 11:47:47.767788898 +0200
+++ /home/wagi/work/blktests/results/nodev/nvme/050.out.bad 2023-06-21 15:19:08.368414289 +0200
@@ -1,2 +1,3 @@
Running nvme/050
+controller "nvme2" not deleted within 5 seconds
Test complete

fc:

run blktests nvme/050 at 2023-06-21 15:11:31
loop0: detected capacity change from 0 to 32768
nvmet: adding nsid 1 to subsystem blktests-subsystem-1
nvme nvme2: NVME-FC{0}: create association : host wwpn 0x20001100aa000002 rport wwpn 0x20001100aa000001: NQN "blktests-subsystem-1"
(NULL device *): {0:0} Association created
[7088] nvmet: ctrl 1 start keep-alive timer for 5 secs
nvmet: creating nvm controller 1 for subsystem blktests-subsystem-1 for NQN nqn.2014-08.org.nvmexpress:uuid:77b49aba-06b4-431a-9af8-75e318740f1a.
[6743] nvmet: adding queue 1 to ctrl 1.
[6312] nvmet: adding queue 2 to ctrl 1.
[7088] nvmet: adding queue 3 to ctrl 1.
[6927] nvmet: adding queue 4 to ctrl 1.
nvme nvme2: NVME-FC{0}: new ctrl: NQN "blktests-subsystem-1"
nvme nvme2: NVME-FC{0}: io failed due to lldd error 6
nvme nvme2: NVME-FC{0}: transport association event: transport detected io error
nvme nvme2: NVME-FC{0}: resetting controller
[7088] nvmet: ctrl 1 stop keep-alive
(NULL device *): {0:0} Association deleted
nvme nvme2: NVME-FC{0}: create association : host wwpn 0x20001100aa000002 rport wwpn 0x20001100aa000001: NQN "blktests-subsystem-1"
(NULL device *): {0:0} Association freed
(NULL device *): {0:0} Association created
(NULL device *): Disconnect LS failed: No Association
nvmet: connect by host nqn.2014-08.org.nvmexpress:uuid:77b49aba-06b4-431a-9af8-75e318740f1a for subsystem blktests-subsystem-1 not allowed
nvme_fabrics: nvmf_log_connect_error: DNR 1
nvme nvme2: Connect for subsystem blktests-subsystem-1 is not allowed, hostnqn: nqn.2014-08.org.nvmexpress:uuid:77b49aba-06b4-431a-9af8-75e318740f1a
nvme nvme2: NVME-FC{0}: reset: Reconnect attempt failed (16772)
nvme nvme2: NVME-FC{0}: reconnect failure
nvme nvme2: Removing ctrl: NQN "blktests-subsystem-1"
(NULL device *): {0:0} Association deleted
(NULL device *): {0:0} Association freed
(NULL device *): Disconnect LS failed: No Association

tcp:

run blktests nvme/050 at 2023-06-21 15:11:36
loop0: detected capacity change from 0 to 32768
nvmet: adding nsid 1 to subsystem blktests-subsystem-1
nvmet_tcp: enabling port 0 (127.0.0.1:4420)
[62] nvmet: ctrl 1 start keep-alive timer for 5 secs
nvmet: creating nvm controller 1 for subsystem blktests-subsystem-1 for NQN nqn.2014-08.org.nvmexpress:uuid:77b49aba-06b4-431a-9af8-75e318740f1a.
nvme nvme2: creating 4 I/O queues.
nvme nvme2: mapped 4/0/0 default/read/poll queues.
[62] nvmet: adding queue 1 to ctrl 1.
[214] nvmet: adding queue 2 to ctrl 1.
[215] nvmet: adding queue 3 to ctrl 1.
[177] nvmet: adding queue 4 to ctrl 1.
nvme nvme2: new ctrl: NQN "blktests-subsystem-1", addr 127.0.0.1:4420
nvme nvme2: starting error recovery
nvme nvme2: Reconnecting in 1 seconds...
[6743] nvmet: ctrl 1 stop keep-alive
nvmet: connect by host nqn.2014-08.org.nvmexpress:uuid:77b49aba-06b4-431a-9af8-75e318740f1a for subsystem blktests-subsystem-1 not allowed
nvme_fabrics: nvmf_log_connect_error: DNR 1
nvme nvme2: Connect for subsystem blktests-subsystem-1 is not allowed, hostnqn: nqn.2014-08.org.nvmexpress:uuid:77b49aba-06b4-431a-9af8-75e318740f1a
nvme nvme2: failed to connect queue: 0 ret=16772
nvme nvme2: Failed reconnect attempt 1
nvme nvme2: Reconnecting in 1 seconds...
nvmet: connect by host nqn.2014-08.org.nvmexpress:uuid:77b49aba-06b4-431a-9af8-75e318740f1a for subsystem blktests-subsystem-1 not allowed
nvme_fabrics: nvmf_log_connect_error: DNR 1
nvme nvme2: Connect for subsystem blktests-subsystem-1 is not allowed, hostnqn: nqn.2014-08.org.nvmexpress:uuid:77b49aba-06b4-431a-9af8-75e318740f1a
nvme nvme2: failed to connect queue: 0 ret=16772
nvme nvme2: Failed reconnect attempt 2
nvme nvme2: Reconnecting in 1 seconds...
nvmet: connect by host nqn.2014-08.org.nvmexpress:uuid:77b49aba-06b4-431a-9af8-75e318740f1a for subsystem blktests-subsystem-1 not allowed
nvme_fabrics: nvmf_log_connect_error: DNR 1
nvme nvme2: Connect for subsystem blktests-subsystem-1 is not allowed, hostnqn: nqn.2014-08.org.nvmexpress:uuid:77b49aba-06b4-431a-9af8-75e318740f1a
nvme nvme2: failed to connect queue: 0 ret=16772
nvme nvme2: Failed reconnect attempt 3
nvme nvme2: Reconnecting in 1 seconds...
nvmet: connect by host nqn.2014-08.org.nvmexpress:uuid:77b49aba-06b4-431a-9af8-75e318740f1a for subsystem blktests-subsystem-1 not allowed
nvme_fabrics: nvmf_log_connect_error: DNR 1
nvme nvme2: Connect for subsystem blktests-subsystem-1 is not allowed, hostnqn: nqn.2014-08.org.nvmexpress:uuid:77b49aba-06b4-431a-9af8-75e318740f1a
nvme nvme2: failed to connect queue: 0 ret=16772
nvme nvme2: Failed reconnect attempt 4
nvme nvme2: Reconnecting in 1 seconds...
nvmet: connect by host nqn.2014-08.org.nvmexpress:uuid:77b49aba-06b4-431a-9af8-75e318740f1a for subsystem blktests-subsystem-1 not allowed
nvme_fabrics: nvmf_log_connect_error: DNR 1
nvme nvme2: Connect for subsystem blktests-subsystem-1 is not allowed, hostnqn: nqn.2014-08.org.nvmexpress:uuid:77b49aba-06b4-431a-9af8-75e318740f1a
nvme nvme2: failed to connect queue: 0 ret=16772
nvme nvme2: Failed reconnect attempt 5
nvme nvme2: Reconnecting in 1 seconds...
nvme nvme2: Removing ctrl: NQN "blktests-subsystem-1"
nvme nvme2: Property Set error: 880, offset 0x14

Daniel Wagner (1):
nvme/050: test DNR handling on reconnect

tests/nvme/050 | 126 +++++++++++++++++++++++++++++++++++++++++++++
tests/nvme/050.out | 2 +
2 files changed, 128 insertions(+)
create mode 100644 tests/nvme/050
create mode 100644 tests/nvme/050.out

--
2.41.0