Re: [PATCH blktests v1 2/3] nvme/rc: Avoid triggering host nvme-cli autoconnect

From: Max Gurtovoy
Date: Thu Jul 13 2023 - 04:50:31 EST




On 13/07/2023 9:00, Hannes Reinecke wrote:
On 7/13/23 02:12, Max Gurtovoy wrote:


On 12/07/2023 15:04, Daniel Wagner wrote:
On Mon, Jul 10, 2023 at 07:30:20PM +0300, Max Gurtovoy wrote:


On 10/07/2023 18:03, Daniel Wagner wrote:
On Mon, Jul 10, 2023 at 03:31:23PM +0300, Max Gurtovoy wrote:
I think it is more than just commit message.

Okay, starting to understand what's the problem.

A lot of code that we can avoid was added regarding the --context cmdline
argument.

Correct and it's not optional to get the tests passing for the fc transport.

why the fc needs the --context to pass tests ?

A typical nvme test consists out of following steps (nvme/004):

// nvme target setup (1)
    _create_nvmet_subsystem "blktests-subsystem-1" "${loop_dev}" \
        "91fdba0d-f87b-4c25-b80f-db7be1418b9e"
    _add_nvmet_subsys_to_port "${port}" "blktests-subsystem-1"

// nvme host setup (2)
    _nvme_connect_subsys "${nvme_trtype}" blktests-subsystem-1

    local nvmedev
    nvmedev=$(_find_nvme_dev "blktests-subsystem-1")
    cat "/sys/block/${nvmedev}n1/uuid"
    cat "/sys/block/${nvmedev}n1/wwid"

// nvme host teardown (3)
    _nvme_disconnect_subsys blktests-subsystem-1

// nvme target teardown (4)
    _remove_nvmet_subsystem_from_port "${port}" "blktests-subsystem-1"
    _remove_nvmet_subsystem "blktests-subsystem-1"


The corresponding output with --context

  run blktests nvme/004 at 2023-07-12 13:49:50
// (1)
  loop0: detected capacity change from 0 to 32768
  nvmet: adding nsid 1 to subsystem blktests-subsystem-1
  nvme nvme2: NVME-FC{0}: create association : host wwpn 0x20001100aa000002  rport wwpn 0x20001100aa000001: NQN "blktests-subsystem-1"
  (NULL device *): {0:0} Association created
  [174] nvmet: ctrl 1 start keep-alive timer for 5 secs
// (2)
  nvmet: creating nvm controller 1 for subsystem blktests-subsystem-1 for NQN nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349.
  [374] nvmet: adding queue 1 to ctrl 1.
  [1138] nvmet: adding queue 2 to ctrl 1.
  [73] nvmet: adding queue 3 to ctrl 1.
  [174] nvmet: adding queue 4 to ctrl 1.
  nvme nvme2: NVME-FC{0}: controller connect complete
  nvme nvme2: NVME-FC{0}: new ctrl: NQN "blktests-subsystem-1"
// (3)
  nvme nvme2: Removing ctrl: NQN "blktests-subsystem-1"
// (4)
  [1138] nvmet: ctrl 1 stop keep-alive
  (NULL device *): {0:0} Association deleted
  (NULL device *): {0:0} Association freed
  (NULL device *): Disconnect LS failed: No Association


and without --context

  run blktests nvme/004 at 2023-07-12 13:50:33
// (1)
  loop1: detected capacity change from 0 to 32768
  nvmet: adding nsid 1 to subsystem blktests-subsystem-1
  nvme nvme2: NVME-FC{0}: create association : host wwpn 0x20001100aa000002  rport wwpn 0x20001100aa000001: NQN "nqn.2014-08.org.nvmexpress.discovery"

why does this association to discovery controller created ? because of some system service ?

Yes. There are nvme-autoconnect udev rules and systemd services installed per default (in quite some systems now).
And it's really hard (if not impossible) to disable these services (as we cannot be sure how they are named, hence we wouldn't know which service to disable.

Right. We shouldn't disable them IMO.


can we configure the blktests subsystem not to be discovered or add some access list to it ?

But that's precisely what the '--context' thing is attempting to do ...

I'm not sure it is.

Exposing the subsystem is from the target configuration side.
Additionally, the --context (which is in the initiator/host side), according to Daniel, is there to distinguish between different invocations. I proposed that blktests subsystem will not be part of discoverable fabric or protected somehow by access list. Therefore, no additional invocation will happen.



[ .. ]

It really solves the problem that the autoconnect setup of nvme-cli is
distrubing the tests (*). The only other way I found to stop the autoconnect is by disabling the udev rule completely. If autoconnect isn't enabled the context isn't necessary.
Though changing system configuration from blktests seems at bit excessive.

we should not stop any autoconnect during blktests. The autoconnect and all the system admin services should run normally.

I do not agree here. The current blktests are not designed for run as
intergration tests. Sure we should also tests this but currently blktests is just not there and tcp/rdma are not actually covered anyway.

what do you mean tcp/rdma not covered ?

Because there is no autoconnect functionality for tcp/rdma.
For FC we have full topology information, and the driver can emit udev messages whenever a NVMe port appears in the fabrics (and the systemd machinery will then start autoconnect).
For TCP/RDMA we do not have this, so really there's nothing which could send udev events (discounting things like mDNS and nvme-stas for now).

And maybe we should make several changes in the blktests to make it standalone without interfering the existing configuration make by some system administrator.

??
But this is what we are trying with this patches.
The '--context' flag only needs to be set for the blktests, to inform the rest of the system that these subsystems/configuration is special and should be exempted from 'normal' system processing.

The --context is initiator configuration. I'm referring to changes in the target configuration.
This will guarantee that things will work also in the environment where we have nvme-cli without the --context flag.


Cheers,

Hannes