Re: [PATCH] nvme: Don't fail to resume if NSIDs change

From: August Wikerfors
Date: Mon Jul 31 2023 - 15:54:48 EST


On 2023-07-31 21:10, Keith Busch wrote:
On Mon, Jul 31, 2023 at 01:51:03PM -0500, Mario Limonciello wrote:
Samsung PM9B1 has problems after resume because NSID has changed.
This has been reported in the past on OEM varities of PM9B1 parts
and fixed by firmware updates on 'some' of those parts.

However this same issue also happens on 'retail' PM9B1 parts which
Samsung has not released firmware updates for.

As the check has been relaxed at startup for multiple disks with
duplicate NSIDs with commit ac522fc6c3165 ("nvme: don't reject
probe due to duplicate IDs for single-ported PCIe devices") also
relax the check that runs on resume for NSIDs and mark them bogus
if this occurs on resume.

How could the driver tell the difference between the device needing a
quirk compared to a rapid delete-create-attach namespace sequence?
Proceeding with the namespace now may get dirty writes intended for the
previous namespace, corrupting the new one.

The commit you mentioned tries to constrain allowing duplication where
we can reasonably assume the quirk is needed. If we need to do similiar
for this condition, one possible constraint might be that the device
doesn't report OACS bit 3 (Namespace Management).

It looks like that would work for the PM9B1:
$ sudo nvme id-ctrl -H /dev/nvme0
[...] > oacs : 0x17
[10:10] : 0 Lockdown Command and Feature Not Supported
[9:9] : 0 Get LBA Status Capability Not Supported
[8:8] : 0 Doorbell Buffer Config Not Supported
[7:7] : 0 Virtualization Management Not Supported
[6:6] : 0 NVMe-MI Send and Receive Not Supported
[5:5] : 0 Directives Not Supported
[4:4] : 0x1 Device Self-test Supported
[3:3] : 0 NS Management and Attachment Not Supported
[2:2] : 0x1 FW Commit and Download Supported
[1:1] : 0x1 Format NVM Supported
[0:0] : 0x1 Security Send and Receive Supported

Regards,
August Wikerfors