Re: [PATCH v14.a 1/1] PCI: Only put Intel PCIe ports >= 2015 into D3

From: Limonciello, Mario
Date: Mon Aug 21 2023 - 22:33:01 EST




On 8/21/2023 5:42 PM, Bjorn Helgaas wrote:
On Fri, Aug 18, 2023 at 02:39:32PM -0500, Mario Limonciello wrote:
commit 9d26d3a8f1b0 ("PCI: Put PCIe ports into D3 during suspend")
changed pci_bridge_d3_possible() so that any vendor's PCIe ports
from modern machines (>=2015) are allowed to be put into D3.

Iain reports that USB devices can't be used to wake a Lenovo Z13
from suspend. This is because the PCIe root port has been put
into D3 and AMD's platform can't handle USB devices waking in this
case.

This behavior is only reported on Linux. Comparing the behavior
on Windows and Linux, Windows doesn't put the root ports into D3.

To fix the issue without regressing existing Intel systems,
limit the >=2015 check to only apply to Intel PCIe ports.

Cc: stable@xxxxxxxxxxxxxxx
Fixes: 9d26d3a8f1b0 ("PCI: Put PCIe ports into D3 during suspend")
Reported-by: Iain Lane <iain@xxxxxxxxxxxxxxxxxxx>
Closes: https://forums.lenovo.com/t5/Ubuntu/Z13-can-t-resume-from-suspend-with-external-USB-keyboard/m-p/5217121
Reviewed-by:Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@xxxxxxxxxxxxxxx>
Signed-off-by: Mario Limonciello <mario.limonciello@xxxxxxx>
---
In v14 this series has been split into 3 parts.
part A: Immediate fix for AMD issue.
part B: LPS0 export improvements
part C: Long term solution for all vendors
v13->v14:
* Reword the comment
* add tag
v12->v13:
* New patch
---
drivers/pci/pci.c | 11 ++++++++---
1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 60230da957e0c..bfdad2eb36d13 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -3037,10 +3037,15 @@ bool pci_bridge_d3_possible(struct pci_dev *bridge)
return false;
/*
- * It should be safe to put PCIe ports from 2015 or newer
- * to D3.
+ * Allow Intel PCIe ports from 2015 onward to go into D3 to
+ * achieve additional energy conservation on some platforms.
+ *
+ * This is only set for Intel PCIe ports as it causes problems
+ * on both AMD Rembrandt and Phoenix platforms where USB keyboards
+ * can not be used to wake the system from suspend.
*/
- if (dmi_get_bios_year() >= 2015)
+ if (bridge->vendor == PCI_VENDOR_ID_INTEL &&
+ dmi_get_bios_year() >= 2015)
return true;

Hmm. I'm really not a fan of checks like this that aren't connected
to an actual property of the platform. The Intel Vendor ID tells us
nothing about what the actual problem is, which makes it really hard
to maintain in the future. It's also very AMD- and Intel-centric,
when this code is ostensibly arch-agnostic, so this potentially
regresses ARM64, RISC-V, powerpc, etc.

It's bad enough that we check for 2015. A BIOS security update to a
2014 platform will break things, even though the update has nothing to
do with D3. We're stuck with that one, and it's old enough that maybe
it won't bite us any more, but I hate to add more.

I don't see this change as adding any more checks.
It's correcting what should have been a more narrow check when that one was introduced.

There was no spec backing the 2015 date, it was just "hardware works with this and it's needed".


The list of conditions in pci_bridge_d3_possible() is a pretty good
clue that we don't really know what we're doing, and all we can do is
find configurations that happen to work.


I see this function as stemming from three high level desires that intersect.

1) Make hardware not originally designated for the PC ecosystem work.
Macs fall in this camp. They don't always adhere to the same "firmware norms" as UEFI PCs do. They don't run Microsoft's certifications, and thus they don't always work the same.

2) Make hardware compatible to the the ACPI and PCI specs where possible.

3) Make hardware compatible with Windows behavior.

In a strict world <2> would be the only one that was followed and everything else would be relegated to quirks or sub-drivers that made decisions, but we are where we are.

I'm not saying it's a bad thing that all 3 goals overlap.
If not for the changes that were introduced into this function for Mac compatibility I don't think we would Thunderbolt/USB4 where it is today.

I don't have any better suggestions, other than that this should be
described somehow via ACPI (and not in vendor-specific stuff like
PNP0D80).

Bjorn

The problem is the ACPI spec doesn't say what OSPM should do when something isn't power manageable by ACPI. Can you really argue it should?

Even if we DID manage to get something added to the spec how does that help everything already on the marketplace that's broken?

If we can't fix the check to be more narrow the only options I see left:

0) Do nothing.

Document somewhere that if you're on AMD and care about wake from keyboard that you need pcie_port_pm=off. I really don't think this is a good idea. Here's why:

Completely separate from this wake from USB issue I know of another issue where some PHX BIOS versions HANG the system during resume because of root port being in D3.

It's fixed in newer PHX BIOS versions, but if you end up with an OEM system that happened to launch with one of these you're in a very bad place. I haven't mentioned it because I've not seen an OEM system with this yet.

If we "do nothing" the only way to solve those will be to grow the DMI avoid list if any of these come up.

1) Hardcode all the Intel root ports that need D3 into a quirk list.

I don't know how big this list is, but I assume it's massive and doesn't scale unless the constraints stuff works for Intel to opt in the newer
things.

2) Hardcode quirks for all the affected AMD PCI root ports to avoid D3.

It's 4 of them for the 2022-2023 platforms.
It will probably be another 2 more for 2024 ones, and then another 2 more for 2025 ones.
Maybe by 2025 the affected root port can handle D3 and let USB wake it up? I don't know.

3) Move the quirk somewhere that AMD can maintain specifically for this case outside of drivers/pci.

I did prototype it being put into drivers/platform/x86/amd/pmc.c or drivers/platform/x86/amd/pmc_quirks.c instead as part of the suspend callbacks or LPS0 callbacks.

This works and can scale as that driver at a minimum gets new IDs added for new platforms. However it makes all this logic much more convoluted and harder for anyone to follow.