Re: Query related to shutting down NVMe during system suspend

From: nitirawa
Date: Mon Feb 07 2022 - 06:06:56 EST


On 2022-02-01 22:28, Vidya Sagar wrote:
Thanks for the super quick reply and I couldn't agree more.

On 2/1/2022 10:00 PM, Keith Busch wrote:
External email: Use caution opening links or attachments


On Tue, Feb 01, 2022 at 09:52:28PM +0530, Vidya Sagar wrote:
Hi Rafael & Christoph,
My query is regarding the comment and the code that follows after it at
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/nvme/host/pci.c?h=v5.17-rc2#n3243
What I understood from it is that, there is an underlying assumption
that the power to the devices is not removed during the suspend call.
In the case of device-tree based platforms like Tegra194, power is
indeed removed to the devices during suspend-resume process. Hence, the
NVMe devices need to be taken through the shutdown path irrespective of
whether the ASPM states are enabled or not.
I would like to hear from you the best method to follow to achieve this.

Since platform makers can't converge on how to let a driver know what
it's supposed to do, I suggest we default to the simple shutdown suspend
all the time. We can add a module parameter to let a user request nvme
power management if they really want it. No matter what we do here,
someone is going to complain, but at least simple shutdown is safe...


Hi Vidya,

Are you planning to add module parameter based on above discussion. I see similar behaviour even with qualcomm platform.

[ 119.994092] nvme nvme0: I/O 9 QID 0 timeout, reset controller
[ 120.006612] PM: dpm_run_callback(): pci_pm_resume+0x0/0xe4 returns -16
[ 120.013502] nvme 0001:01:00.0: PM: pci_pm_resume+0x0/0xe4 returned -16 after 60059958 usecs
[ 120.022239] nvme 0001:01:00.0: PM: failed to resume async: error -16

Regards,
Nitin