Re: [PATCH v5 0/3] Handle update hardware queues and queue freeze more carefully

From: Daniel Wagner
Date: Fri Aug 20 2021 - 04:48:35 EST

Next message: Dmitry Baryshkov: "Re: [RFC PATCH 14/15] WIP: PCI: qcom: use pwrseq to power up bus devices"
Previous message: David Hildenbrand: "Re: [PATCH v2 2/7] kernel/fork: factor out replacing the current MM exe_file"
In reply to: Daniel Wagner: "[PATCH v5 1/3] nvme-fc: Wait with a timeout for queue to freeze"
Next in thread: Daniel Wagner: "Re: [PATCH v5 0/3] Handle update hardware queues and queue freeze more carefully"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Wed, Aug 18, 2021 at 02:05:27PM +0200, Daniel Wagner wrote:
> I've dropped all non FC patches as they were bogus. I've retested this
> version with all combinations and all looks good now. Also I gave
> nvme-tcp a spin and again all is good.

I forgot to mention I also dropped the first three patches from v4.
Which seems to break her testing again.

Wendy reported all her tests pass with Ming's V7 of 'blk-mq: fix
blk_mq_alloc_request_hctx' and this series *only* if 'nvme-fc: Update
hardware queues before using them' from previous version is also used.

After starring at it once more, I think I finally understood the
problem. So when we do

ret = nvme_fc_create_hw_io_queues(ctrl, ctrl->ctrl.sqsize + 1);
if (ret)
goto out_free_io_queues;

ret = nvme_fc_connect_io_queues(ctrl, ctrl->ctrl.sqsize + 1);
if (ret)
goto out_delete_hw_queues;

and the number of queues has changed, the connect call will fail:

nvme2: NVME-FC{2}: create association : host wwpn 0x100000109b5a4dfa rport wwpn 0x50050768101935e5: NQN "nqn.1986-03.com.ibm:nvme:2145.0000020420006CEA"
nvme2: Connect command failed, error wo/DNR bit: -16389

and we stop the current reconnect attempt and reschedule a new
reconnect attempt:

nvme2: NVME-FC{2}: reset: Reconnect attempt failed (-5)
nvme2: NVME-FC{2}: Reconnect attempt in 2 seconds

Then we try to do the same thing again which fails, thus we never
make progress.

So clearly we need to update number of queues at one point. What would
be the right thing to do here? As I understood we need to be careful
with frozen requests. Can we abort them (is this even possible in this
state?) and requeue them before we update the queue numbers?

Daniel

Next message: Dmitry Baryshkov: "Re: [RFC PATCH 14/15] WIP: PCI: qcom: use pwrseq to power up bus devices"
Previous message: David Hildenbrand: "Re: [PATCH v2 2/7] kernel/fork: factor out replacing the current MM exe_file"
In reply to: Daniel Wagner: "[PATCH v5 1/3] nvme-fc: Wait with a timeout for queue to freeze"
Next in thread: Daniel Wagner: "Re: [PATCH v5 0/3] Handle update hardware queues and queue freeze more carefully"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]