Re: [PATCH v7 2/2] PCI: Don't put non-power manageable PCIe root ports into D3

From: Limonciello, Mario
Date: Wed Jul 12 2023 - 12:09:27 EST


On 7/12/2023 07:13, Rafael J. Wysocki wrote:
On Wed, Jul 12, 2023 at 12:54 AM Mario Limonciello
<mario.limonciello@xxxxxxx> wrote:

On 7/11/23 17:14, Bjorn Helgaas wrote:
[+cc Andy, Intel MID stuff]

On Mon, Jul 10, 2023 at 07:53:25PM -0500, Mario Limonciello wrote:
Since commit 9d26d3a8f1b0 ("PCI: Put PCIe ports into D3 during suspend")
PCIe ports from modern machines (>2015) are allowed to be put into D3 by
storing a flag in the `struct pci_dev` structure.

It looks like >= 2015 (not >2015). I think "a flag" refers to
"bridge_d3".

Yeah.


pci_power_manageable() uses this flag to indicate a PCIe port can enter D3.
pci_pm_suspend_noirq() uses the return from pci_power_manageable() to
decide whether to try to put a device into its target state for a sleep
cycle via pci_prepare_to_sleep().

For devices that support D3, the target state is selected by this policy:
1. If platform_pci_power_manageable():
Use platform_pci_choose_state()
2. If the device is armed for wakeup:
Select the deepest D-state that supports a PME.
3. Else:
Use D3hot.

Devices are considered power manageable by the platform when they have
one or more objects described in the table in section 7.3 of the ACPI 6.4
specification.

No point in citing an old version, so please cite ACPI r6.5, sec 7.3.

The spec claims we only need one object from the table for a device to
be "power-managed", but in reality, it looks like the only things that
actually *control* power are _PRx (the _ON/_OFF methods of Power
Resources) and _PSx (ironically only mentioned parenthically).


Your point has me actually wondering if I've got this entirely wrong.

Should we perhaps be looking specifically for the presence of _SxW to
decide if a given PCIe port can go below D0?

There are two things, _SxW and _SxD, and they shouldn't be confused.

_SxW tells you what the deepest power state from which wakeup can be
signaled by the device (in the given Sx state of the system) is.

_SxD tells you what the deepest power state supported by the device
(in the given Sx state of the system) is.

And note that _SxW is applicable to the device itself, not the
subordinate devices, so I'm not sure how meaningful it is for ports.

pci_target_state() uses both _SxW and _SxD to determine the deepest
state the device can go into and so long as it is used properly, it
shouldn't return a power state that is too deep, so I'm not really
sure why you want this special "should the bridge be allowed to go
into D3hot/cold" routine to double check this.

Because pci_target_state only looks at _SxW and _SxD "if" the PCI device is power manageable by ACPI. That's why this change is injecting that extra check in.


IE very similar to what 8133844a8f24 did but for ports that are not
hotplug bridges.

This matches up well with acpi_pci_power_manageable(), which returns
true if a device has either _PR0 or _PS0.

Per ACPI r6.5, sec 7.3, ACPI control of device power states uses
Power Resources (i.e., the _ON/_OFF methods of _PRx) or _PSx
methods. Hence acpi_pci_power_manageable() checks for the presence
of _PR0 or _PS0.

Tangent unrelated to *this* patch: I don't know how to think about the
pci_use_mid_pm() in platform_pci_power_manageable() because I haven't
seen a MID spec. pci_use_mid_pm() isn't dependent on "dev", so we
claim *all* PCI devices, even external ones, are power manageable by
the platform, which doesn't seem right.

At suspend Linux puts PCIe root ports that are not power manageable by
the platform into D3hot. Windows only puts PCIe root ports into D3 when
they are power manageable by the platform.

The policy selected for Linux to put non-power manageable PCIe root ports
into D3hot at system suspend doesn't match anything in the PCIe or ACPI
specs.

Linux shouldn't assume PCIe root ports support D3 just because
they're on a machine newer than 2015, the ports should also be considered
power manageable by the platform.

Add an extra check for PCIe root ports to ensure D3 isn't selected for
them if they are not power-manageable through platform firmware.
This will avoid pci_pm_suspend_noirq() changing the power state
via pci_prepare_to_sleep().

The check is focused on PCIe root ports because they are part of
the platform. Other PCIe bridges may be connected externally and thus
cannot impose platform specific limitations.

Link: https://uefi.org/htmlspecs/ACPI_Spec_6_4_html/07_Power_and_Performance_Mgmt/device-power-management-objects.html [1]

At least for myself when I respin this, here is the 6.5 link.
https://uefi.org/specs/ACPI/6.5/07_Power_and_Performance_Mgmt.html#device-power-management-objects

Fixes: 9d26d3a8f1b0 ("PCI: Put PCIe ports into D3 during suspend")
Reported-by: Iain Lane <iain@xxxxxxxxxxxxxxxxxxx>
Closes: https://forums.lenovo.com/t5/Ubuntu/Z13-can-t-resume-from-suspend-with-external-USB-keyboard/m-p/5217121
Acked-by: Rafael J. Wysocki <rafael@xxxxxxxxxx>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@xxxxxxxxxxxxxxx>
Reviewed-by: Mika Westerberg <mika.westerberg@xxxxxxxxxxxxxxx>
Signed-off-by: Mario Limonciello <mario.limonciello@xxxxxxx>
---
v6->v7:
* revert back to v5 code, rewrite commit message to specific examples
and be more generic
---
drivers/pci/pci.c | 8 ++++++++
1 file changed, 8 insertions(+)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index f916fd76eba79..4be8c6f8f4ebe 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -3041,6 +3041,14 @@ bool pci_bridge_d3_possible(struct pci_dev *bridge)
if (dmi_check_system(bridge_d3_blacklist))
return false;

+ /*
+ * It's not safe to put root ports that aren't power manageable
+ * by the platform into D3.

Does this refer specifically to D3cold?


No, it's intentionally not saying D3hot or D3cold because it's stored to
"bridge_d3" which is used for both D3hot and D3cold.

I assume that if we were talking about D3hot, we wouldn't need to
check for ACPI support because D3hot behavior should be fully covered
by the PCIe spec.

Right; the PCIe spec indicates that D3hot should be supported by all
devices and has rules about when you can go into D3hot like not allowing
it unless child devices are already in D3.

Naïvely you would think that means pci_bridge_d3_possible() shouldn't
have any of these checks, but they've all obviously all been grown for a
reason.

The value from pci_bridge_d3_possible() is used both "at suspend" and
"runtime", but what we're really talking with this issue is is the
behavior "at suspend".

Which is suspend-to-idle AFAICS, so from the ACPI perspective it is
all S0 anyway.


Let's be specific about D3hot vs D3cold whenever possible.

+ if (pci_pcie_type(bridge) == PCI_EXP_TYPE_ROOT_PORT &&
+ !platform_pci_power_manageable(bridge))
+ return false;

If ACPI says a device is not power-manageable, i.e., ACPI doesn't know
how to put it in D0, it makes sense to return "false" here so we don't
try to put it in D3cold.

But I don't understand the ROOT_PORT check. We may have a Switch
described via ACPI, and the ROOT_PORT check means we can return "true"
(it's OK to use D3cold) even if the Switch Port is not power-manageable
via ACPI.

This feels a lot more of a potential to cause regressions.

Something we could do is include the value for bridge->untrusted in the
decision, but I'm not convinced that's correct.

Another option can be to merge a series of changes like this:

1) My v5 patch that adds the quirks for the two known broken machines
2) Patch 1/2 from v7
3) Patch 2/2 from v7
4) Another patch to drop the root port check here

#1 could go to 6.5-rcX as it's riskless. #2-4 could go to linux-next
and if they work out not to cause any problems we could revert #1.

If they cause problems we come back to the drawing table to find the
right balance.

Generally speaking, pci_bridge_d3_possible() is there to prevent
bridges (and PCIe ports in particular) from being put into D3hot/cold
if there are reasons to believe that it may not work.
acpi_pci_bridge_d3() is part of that.

Even if it returns 'true', the _SxD/_SxW check should still be applied
via pci_target_state() to determine whether or not the firmware allows
this particular bridge to go into D3hot/cold. So arguably, the _SxW
check in acpi_pci_bridge_d3() should not be necessary and if it makes
any functional difference, there is a bug somewhere else.

But only if it was power manageable would the _SxD/_SxW check be applied. This issue is around the branch of pci_target_state() where
it's not power manageable and so it uses PME or it falls back to D3hot.