Re: [Regression] Commit "nvme/pci: Use host managed power state for suspend" has problems

From: Kai-Heng Feng
Date: Thu Jul 25 2019 - 14:20:31 EST


at 01:03, Rafael J. Wysocki <rafael@xxxxxxxxxx> wrote:

On Thu, Jul 25, 2019 at 6:24 PM <Mario.Limonciello@xxxxxxxx> wrote:
+Rajat

-----Original Message-----
From: Kai-Heng Feng <kai.heng.feng@xxxxxxxxxxxxx>
Sent: Thursday, July 25, 2019 9:03 AM
To: Rafael J. Wysocki
Cc: Keith Busch; Christoph Hellwig; Sagi Grimberg; linux-
nvme@xxxxxxxxxxxxxxxxxxx; Limonciello, Mario; Linux PM; LKML
Subject: Re: [Regression] Commit "nvme/pci: Use host managed power state for
suspend" has problems


[EXTERNAL EMAIL]

Hi Rafael,

at 17:51, Rafael J. Wysocki <rjw@xxxxxxxxxxxxx> wrote:

Hi Keith,

Unfortunately,

commit d916b1be94b6dc8d293abed2451f3062f6af7551
Author: Keith Busch <keith.busch@xxxxxxxxx>
Date: Thu May 23 09:27:35 2019 -0600

nvme-pci: use host managed power state for suspend

doesn't universally improve things. In fact, in some cases it makes
things worse.

For example, on the Dell XPS13 9380 I have here it prevents the processor
package
from reaching idle states deeper than PC2 in suspend-to-idle (which, of
course, also
prevents the SoC from reaching any kind of S0ix).

That can be readily explained too. Namely, with the commit above the
NVMe device
stays in D0 over suspend/resume, so the root port it is connected to also
has to stay in
D0 and that "blocks" package C-states deeper than PC2.

In order for the root port to be able to go to D3, the device connected
to it also needs
to go into D3, so it looks like (at least on this particular machine, but
maybe in
general), both D3 and the NVMe-specific PM are needed.

Well this is really unfortunate to hear. I recall that with some disks we were
seeing problems where NVME specific PM wasn't working when the disk was in D3.

On your specific disk, it would be good to know if just removing the pci_save_state(pdev)
call helps.

Yes, it does help.

If so, :
* that might be a better option to add as a parameter.
* maybe we should double check all the disks one more time with that tweak.

At this point it seems so.

I'm not sure what to do here, because evidently there are systems where
that commit
helps. I was thinking about adding a module option allowing the user to
override the
default behavior which in turn should be compatible with 5.2 and earlier
kernels.

I just briefly tested s2i on XPS 9370, and the power meter shows a 0.8~0.9W
power consumption so at least I donât see the issue on XPS 9370.

To me that confirms NVME is down, but it still seems higher than I would have
expected. We should be more typically in the order of ~0.3W I think.

From what Iâve observed, ~0.8W s2idle is already better than Windows (~1W).
0.3W is what I see during S5.


It may go to PC10, but not reach S0ix.

Anyway, I run the s2idle tests under turbostat which then tells me
what has happened more precisely.

The XPS 9370 at my hand does reach to s0ix during s2idle:
# cat /sys/kernel/debug/pmc_core/slp_s0_residency_usec
15998400

So I think keep the root port in D0 is not the culprit here.
Maybe something is wrong on the ASPM settings?

Kai-Heng