RE: [PATCH v3 2/2] nvme: handle persistent internal error AER from NVMe controller

From: Michael Kelley (LINUX)
Date: Wed Jun 08 2022 - 02:54:58 EST


From: Christoph Hellwig <hch@xxxxxx> Sent: Tuesday, June 7, 2022 3:36 AM
>
> On Mon, Jun 06, 2022 at 05:15:15PM -0700, Michael Kelley wrote:
> > +static void nvme_handle_aer_persistent_error(struct nvme_ctrl *ctrl)
> > +{
> > + trace_nvme_async_event(ctrl, NVME_AER_ERROR);
> > +
> > + /*
> > + * We can't read the CSTS here because we're in an atomic context on
> > + * some transports and the read may require submitting a request to the
> > + * to the controller and getting a response. Such a sequence isn't
> > + * likely to be successful anyway if the controller is reporting a
> > + * persistent internal error. So assume CSTS.CFS is set.
> > + */
> > + if (nvme_should_reset(ctrl, NVME_CSTS_CFS)) {
> > + dev_warn(ctrl->device, "resetting controller due to AER\n");
> > + nvme_reset_ctrl(ctrl);
>
> I don't think we even need the nvme_should_reset check now.
>
> nvme_reset_ctrl first calls nvme_change_ctrl_state, which only allows
> the transition to the RESETTING state if it previously was NEW or LIVE,
> so we are already covered. The only downside would be an extra kernel
> message if we already were in another state.

OK, I agree. Patch 1/2 can be dropped since there's now no need to
move nvme_should_reset(), and patch 2 is simplified even further.

I'll do a v4.

Michael