Re: [PATCH] pci: add NVMe FLR quirk to the SM951 SSD

From: Robert Straw
Date: Fri Apr 30 2021 - 18:50:03 EST


On Fri Apr 30, 2021 at 3:51 PM CDT, Bjorn Helgaas wrote:
> Please make your subject line match ffb0863426eb ("PCI: Disable
> Samsung SM961/PM961 NVMe before FLR")

Understood, I will send a revision ASAP.

> There's always the possibility that we are doing something wrong in
> Linux *after* the FLR, e.g., not waiting long enough, not
> reinitializing something correctly, etc.

In my experience I was not able to get my particular drive to enter this
state while issuing various types of resets purely from the Linux host.
The issue only appeared when I pass the device to a KVM guest *and allow
that guest to cleanly shut-down.* The last part is crucial: if the guest
is forcibly powered off Linux was able to reset the drive just fine.

So I suspect the issue here is related to the interaction between
whatever state the guest leaves the NVMe drive in, and the Linux kernel's
own reset code triggering some pathological behavior in the controller.

FWIW even a remove/rescan, with an interim suspend to RAM, was not
enough to unfreeze the controller. The only way I've found to get the
device back (apart from this patch) was a full reboot.