Re: [RFC PATCH 0/4] nvme-tcp: fix hung issues for deleting

From: Ming Lei
Date: Thu Jun 08 2023 - 09:52:07 EST


On Thu, Jun 08, 2023 at 10:48:50AM +0800, 许春光 wrote:
> Ming Lei <ming.lei@xxxxxxxxxx> 于2023年6月8日周四 08:56写道:
> >
> > On Wed, Jun 07, 2023 at 12:09:17PM +0800, 许春光 wrote:
> > > Hi Ming:
> > >
> > > Ming Lei <ming.lei@xxxxxxxxxx> 于2023年6月6日周二 23:15写道:
> > > >
> > > > Hello Chunguang,
> > > >
> > > > On Mon, May 29, 2023 at 06:59:22PM +0800, brookxu.cn wrote:
> > > > > From: Chunguang Xu <chunguang.xu@xxxxxxxxxx>
> > > > >
> > > > > We found that nvme_remove_namespaces() may hang in flush_work(&ctrl->scan_work)
> > > > > while removing ctrl. The root cause may due to the state of ctrl changed to
> > > > > NVME_CTRL_DELETING while removing ctrl , which intterupt nvme_tcp_error_recovery_work()/
> > > > > nvme_reset_ctrl_work()/nvme_tcp_reconnect_or_remove(). At this time, ctrl is
> > > >
> > > > I didn't dig into ctrl state check in these error handler yet, but error
> > > > handling is supposed to provide forward progress for any controller state.
> > > >
> > > > Can you explain a bit how switching to DELETING interrupts the above
> > > > error handling and breaks the forward progress guarantee?
> > >
> > > Here we freezed ctrl, if ctrl state has changed to DELETING or
> > > DELETING_NIO(by nvme disconnect), we will break up and lease ctrl
> > > freeze, so nvme_remove_namespaces() hang.
> > >
> > > static void nvme_tcp_error_recovery_work(struct work_struct *work)
> > > {
> > > ...
> > > if (!nvme_change_ctrl_state(ctrl, NVME_CTRL_CONNECTING)) {
> > > /* state change failure is ok if we started ctrl delete */
> > > WARN_ON_ONCE(ctrl->state != NVME_CTRL_DELETING &&
> > > ctrl->state != NVME_CTRL_DELETING_NOIO);
> > > return;
> > > }
> > >
> > > nvme_tcp_reconnect_or_remove(ctrl);
> > > }
> > >
> > >
> > > Another path, we will check ctrl state while reconnecting, if it changes to
> > > DELETING or DELETING_NIO, we will break up and lease ctrl freeze and
> > > queue quiescing (through reset path), as a result Hang occurs.
> > >
> > > static void nvme_tcp_reconnect_or_remove(struct nvme_ctrl *ctrl)
> > > {
> > > /* If we are resetting/deleting then do nothing */
> > > if (ctrl->state != NVME_CTRL_CONNECTING) {
> > > WARN_ON_ONCE(ctrl->state == NVME_CTRL_NEW ||
> > > ctrl->state == NVME_CTRL_LIVE);
> > > return;
> > > }
> > > ...
> > > }
> > >
> > > > > freezed and queue is quiescing . Since scan_work may continue to issue IOs to
> > > > > load partition table, make it blocked, and lead to nvme_tcp_error_recovery_work()
> > > > > hang in flush_work(&ctrl->scan_work).
> > > > >
> > > > > After analyzation, we found that there are mainly two case:
> > > > > 1. Since ctrl is freeze, scan_work hang in __bio_queue_enter() while it issue
> > > > > new IO to load partition table.
> > > >
> > > > Yeah, nvme freeze usage is fragile, and I suggested to move
> > > > nvme_start_freeze() from nvme_tcp_teardown_io_queues to
> > > > nvme_tcp_configure_io_queues(), such as the posted change on rdma:
> > > >
> > > > https://lore.kernel.org/linux-block/CAHj4cs-4gQHnp5aiekvJmb6o8qAcb6nLV61uOGFiisCzM49_dg@xxxxxxxxxxxxxx/T/#ma0d6bbfaa0c8c1be79738ff86a2fdcf7582e06b0
> > >
> > > While drive reconnecting, I think we should freeze ctrl or quiescing queue,
> > > otherwise nvme_fail_nonready_command()may return BLK_STS_RESOURCE,
> > > and the IOs may retry frequently. So I think we may better freeze ctrl
> > > while entering
> > > error_recovery/reconnect, but need to unfreeze it while exit.
> >
> > quiescing is always done in error handling, and freeze is actually
> > not a must, and it is easier to cause race by calling freeze & unfreeze
> > from different contexts.
>
> I think if we donot freeze ctrl, as the IO already submit (just queue
> to hctx->dispatch) and may
> pending for a long time, it may trigger new hang task issue, but
> freeze ctrl may can avoid these
> hang task.

How can the freeze make the difference? If driver/device can't move on,
any request is stuck, so the IO path waits in either submit_bio() or
upper layer after returning from submit_bio().


Thanks,
Ming