Re: [PATCH] nvme-pci: add NVME_QUIRK_DELAY_BEFORE_CHK_RDY for MAXIO MAP1602

From: Felix Yan
Date: Thu Aug 17 2023 - 01:34:03 EST


On 7/14/23 16:13, Han Gao wrote:
4TB SSD with MAXIO MAP1602 controller is cannot by initialised
in nvme_enable_ctrl with a high probability, which causeds the system
to be unable to use SSD, and SSD device only be shown in lspci.

dmesg output of problem

----------
nvme nvme1: Device not ready; aborting initialisation, CSTS=0x0
----------

Problem and fix are verified with my MAP1602 controller SSD device.

Signed-off-by: Han Gao <highenthalpyh@xxxxxxxxx>
Signed-off-by: David Xu <xuwd1@xxxxxxxxxxx>
---
drivers/nvme/host/pci.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 492f319ebdf3..f75c27730bde 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -3425,7 +3425,8 @@ static const struct pci_device_id nvme_id_table[] = {
{ PCI_DEVICE(0x1e4B, 0x1202), /* MAXIO MAP1202 */
.driver_data = NVME_QUIRK_BOGUS_NID, },
{ PCI_DEVICE(0x1e4B, 0x1602), /* MAXIO MAP1602 */
- .driver_data = NVME_QUIRK_BOGUS_NID, },
+ .driver_data = NVME_QUIRK_BOGUS_NID |
+ NVME_QUIRK_DELAY_BEFORE_CHK_RDY, },
{ PCI_DEVICE(0x1cc1, 0x5350), /* ADATA XPG GAMMIX S50 */
.driver_data = NVME_QUIRK_BOGUS_NID, },
{ PCI_DEVICE(0x1dbe, 0x5236), /* ADATA XPG GAMMIX S70 */

Unfortunately this doesn't fix it for me. As someone pointed out in the forums [1], the quirk NVME_QUIRK_DELAY_BEFORE_CHK_RDY only takes effect in nvme_disable_ctrl(), but we are hitting the timeout issue in nvme_enable_ctrl() instead.

I have tried 6.5-rc5 with or without this patch but got mixed results due to even more severe disk-writing related issues (deadlock in writing to any disk).

With the following patch taken from the forums, the SSDs work flawlessly on 6.4.X kernels:

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 3ec38e2b9173..ab2583cb42aa 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -2408,6 +2408,7 @@ int nvme_enable_ctrl(struct nvme_ctrl *ctrl)
} else {
timeout = NVME_CAP_TIMEOUT(ctrl->cap);
}
+ dev_info(ctrl->device, "[PATCH] nvme core got timeout %u\n",timeout);

ctrl->ctrl_config |= (NVME_CTRL_PAGE_SHIFT - 12) << NVME_CC_MPS_SHIFT;
ctrl->ctrl_config |= NVME_CC_AMS_RR | NVME_CC_SHN_NONE;
@@ -2425,8 +2426,9 @@ int nvme_enable_ctrl(struct nvme_ctrl *ctrl)
ret = ctrl->ops->reg_write32(ctrl, NVME_REG_CC, ctrl->ctrl_config);
if (ret)
return ret;
+ dev_info(ctrl->device, "[PATCH] nvme_wait_ready now wait for %u, previously %u\n",(timeout + 1) * 2, (timeout + 1)/2);
return nvme_wait_ready(ctrl, NVME_CSTS_RDY, NVME_CSTS_RDY,
- (timeout + 1) / 2, "initialisation");
+ (timeout + 1) * 2, "initialisation");
}
EXPORT_SYMBOL_GPL(nvme_enable_ctrl);

Debug outputs in the dmesg:

[ 1.030057] nvme nvme1: [PATCH] nvme core got timeout 0
[ 1.030062] nvme nvme1: [PATCH] nvme_wait_ready now wait for 2, previously 0
[ 1.031356] nvme nvme0: [PATCH] nvme core got timeout 0
[ 1.031368] nvme nvme0: [PATCH] nvme_wait_ready now wait for 2, previously 0

6.1 LTS kernels don't have this problem because the timeout isn't 0 here.

Would it be a good idea to apply NVME_QUIRK_DELAY_BEFORE_CHK_RDY to nvme_enable_ctrl too? Or shall we add another quirk just for this?

[1] https://www.chiphell.com/forum.php?mod=viewthread&tid=2524660&extra=&page=5&mobile=no (in Chinese)

--
Regards,
Felix Yan

Attachment: OpenPGP_signature.asc
Description: OpenPGP digital signature