Re: [PATCH blktests v1 2/3] nvme/rc: Avoid triggering host nvme-cli autoconnect

From: Hannes Reinecke
Date: Thu Jul 13 2023 - 02:00:40 EST


On 7/13/23 02:12, Max Gurtovoy wrote:


On 12/07/2023 15:04, Daniel Wagner wrote:
On Mon, Jul 10, 2023 at 07:30:20PM +0300, Max Gurtovoy wrote:


On 10/07/2023 18:03, Daniel Wagner wrote:
On Mon, Jul 10, 2023 at 03:31:23PM +0300, Max Gurtovoy wrote:
I think it is more than just commit message.

Okay, starting to understand what's the problem.

A lot of code that we can avoid was added regarding the --context cmdline
argument.

Correct and it's not optional to get the tests passing for the fc transport.

why the fc needs the --context to pass tests ?

A typical nvme test consists out of following steps (nvme/004):

// nvme target setup (1)
    _create_nvmet_subsystem "blktests-subsystem-1" "${loop_dev}" \
        "91fdba0d-f87b-4c25-b80f-db7be1418b9e"
    _add_nvmet_subsys_to_port "${port}" "blktests-subsystem-1"

// nvme host setup (2)
    _nvme_connect_subsys "${nvme_trtype}" blktests-subsystem-1

    local nvmedev
    nvmedev=$(_find_nvme_dev "blktests-subsystem-1")
    cat "/sys/block/${nvmedev}n1/uuid"
    cat "/sys/block/${nvmedev}n1/wwid"

// nvme host teardown (3)
    _nvme_disconnect_subsys blktests-subsystem-1

// nvme target teardown (4)
    _remove_nvmet_subsystem_from_port "${port}" "blktests-subsystem-1"
    _remove_nvmet_subsystem "blktests-subsystem-1"


The corresponding output with --context

  run blktests nvme/004 at 2023-07-12 13:49:50
// (1)
  loop0: detected capacity change from 0 to 32768
  nvmet: adding nsid 1 to subsystem blktests-subsystem-1
  nvme nvme2: NVME-FC{0}: create association : host wwpn 0x20001100aa000002  rport wwpn 0x20001100aa000001: NQN "blktests-subsystem-1"
  (NULL device *): {0:0} Association created
  [174] nvmet: ctrl 1 start keep-alive timer for 5 secs
// (2)
  nvmet: creating nvm controller 1 for subsystem blktests-subsystem-1 for NQN nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349.
  [374] nvmet: adding queue 1 to ctrl 1.
  [1138] nvmet: adding queue 2 to ctrl 1.
  [73] nvmet: adding queue 3 to ctrl 1.
  [174] nvmet: adding queue 4 to ctrl 1.
  nvme nvme2: NVME-FC{0}: controller connect complete
  nvme nvme2: NVME-FC{0}: new ctrl: NQN "blktests-subsystem-1"
// (3)
  nvme nvme2: Removing ctrl: NQN "blktests-subsystem-1"
// (4)
  [1138] nvmet: ctrl 1 stop keep-alive
  (NULL device *): {0:0} Association deleted
  (NULL device *): {0:0} Association freed
  (NULL device *): Disconnect LS failed: No Association


and without --context

  run blktests nvme/004 at 2023-07-12 13:50:33
// (1)
  loop1: detected capacity change from 0 to 32768
  nvmet: adding nsid 1 to subsystem blktests-subsystem-1
  nvme nvme2: NVME-FC{0}: create association : host wwpn 0x20001100aa000002  rport wwpn 0x20001100aa000001: NQN "nqn.2014-08.org.nvmexpress.discovery"

why does this association to discovery controller created ? because of some system service ?

Yes. There are nvme-autoconnect udev rules and systemd services installed per default (in quite some systems now).
And it's really hard (if not impossible) to disable these services (as we cannot be sure how they are named, hence we wouldn't know which service to disable.

can we configure the blktests subsystem not to be discovered or add some access list to it ?

But that's precisely what the '--context' thing is attempting to do ...

[ .. ]

It really solves the problem that the autoconnect setup of nvme-cli is
distrubing the tests (*). The only other way I found to stop the autoconnect is by disabling the udev rule completely. If autoconnect isn't enabled the context isn't necessary.
Though changing system configuration from blktests seems at bit excessive.

we should not stop any autoconnect during blktests. The autoconnect and all the system admin services should run normally.

I do not agree here. The current blktests are not designed for run as
intergration tests. Sure we should also tests this but currently blktests is just not there and tcp/rdma are not actually covered anyway.

what do you mean tcp/rdma not covered ?

Because there is no autoconnect functionality for tcp/rdma.
For FC we have full topology information, and the driver can emit udev messages whenever a NVMe port appears in the fabrics (and the systemd machinery will then start autoconnect).
For TCP/RDMA we do not have this, so really there's nothing which could send udev events (discounting things like mDNS and nvme-stas for now).

And maybe we should make several changes in the blktests to make it standalone without interfering the existing configuration make by some system administrator.

??
But this is what we are trying with this patches.
The '--context' flag only needs to be set for the blktests, to inform the rest of the system that these subsystems/configuration is special and should be exempted from 'normal' system processing.

Cheers,

Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
hare@xxxxxxx +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman