Re: [PATCH v15 2/2] platform/x86/amd: pmc: Don't let PCIe root ports go into D3

From: Hans de Goede
Date: Mon Aug 28 2023 - 05:20:40 EST


Hi Mario,

On 8/28/23 06:28, Mario Limonciello wrote:
> commit 9d26d3a8f1b0 ("PCI: Put PCIe ports into D3 during suspend")
> changed pci_bridge_d3_possible() so that any vendor's PCIe ports
> from modern machines (>=2015) are allowed to be put into D3.
>
> Iain reports that USB devices can't be used to wake a Lenovo Z13
> from suspend. This is because the PCIe root port has been put
> into D3 and AMD's platform can't handle USB devices waking from
> a hardware sleep state in this case.
>
> This problem only occurs on Linux, and only when the AMD PMC driver
> is utilized to put the device into a hardware sleep state. Comparing
> the behavior on Windows and Linux, Windows doesn't put the root ports
> into D3.
>
> A variety of approaches were discussed to change PCI core to handle this
> case generically but no consensus was reached. To limit the scope of
> effect only to the affected machines introduce a workaround into the
> amd-pmc driver to only apply to the PCI root ports in affected machines
> when going into hardware sleep.
>
> Link: https://lore.kernel.org/linux-pci/20230818193932.27187-1-mario.limonciello@xxxxxxx/
> Fixes: 9d26d3a8f1b0 ("PCI: Put PCIe ports into D3 during suspend")
> Reported-by: Iain Lane <iain@xxxxxxxxxxxxxxxxxxx>
> Closes: https://forums.lenovo.com/t5/Ubuntu/Z13-can-t-resume-from-suspend-with-external-USB-keyboard/m-p/5217121
> Signed-off-by: Mario Limonciello <mario.limonciello@xxxxxxx>
> ---
> drivers/platform/x86/amd/pmc/pmc.c | 19 +++++++++++++++++++
> 1 file changed, 19 insertions(+)
>
> diff --git a/drivers/platform/x86/amd/pmc/pmc.c b/drivers/platform/x86/amd/pmc/pmc.c
> index eb2a4263814c..f7bfe704ce39 100644
> --- a/drivers/platform/x86/amd/pmc/pmc.c
> +++ b/drivers/platform/x86/amd/pmc/pmc.c
> @@ -741,6 +741,21 @@ static int amd_pmc_czn_wa_irq1(struct amd_pmc_dev *pdev)
> return 0;
> }
>
> +static int amd_pmc_rp_wa(struct amd_pmc_dev *pdev)
> +{
> + struct pci_dev *pci_dev = NULL;
> +
> + while ((pci_dev = pci_get_device(PCI_VENDOR_ID_AMD, PCI_ANY_ID, pci_dev))) {
> + if (!pci_is_pcie(pci_dev) ||
> + !(pci_pcie_type(pci_dev) == PCI_EXP_TYPE_ROOT_PORT))
> + continue;
> + pci_dev->bridge_d3 = 0;
> + dev_info_once(pdev->dev, "Disabling D3 for PCIe root ports\n");
> + }
> +
> + return 0;
> +}
> +
> static int amd_pmc_verify_czn_rtc(struct amd_pmc_dev *pdev, u32 *arg)
> {
> struct rtc_device *rtc_device;
> @@ -893,6 +908,10 @@ static int amd_pmc_suspend_handler(struct device *dev)
> case AMD_CPU_ID_CZN:
> rc = amd_pmc_czn_wa_irq1(pdev);
> break;
> + case AMD_CPU_ID_YC:
> + case AMD_CPU_ID_PS:
> + rc = amd_pmc_rp_wa(pdev);
> + break;
> default:
> break;
> }


I'm fine with moving this into the amd-pmc code, but I have some questions about the current approach:

1. The current approach sets pci_dev->bridge_d3 = 0 for all root ports, I assume this WA is indeed necessary for all root ports and not just for one specific root port ?

2. The current approach runs from the suspend pm-op for the PCI-device for the PMC. So when it runs we know that the root-port for the PMC will not have been suspended yet. But what is stopping other root ports, which already have had all their children run-time suspended before the system-suspend, from already being in suspended state and thus possibly in D3 state ?

And we also cannot just set pci_dev->bridge_d3 = 0 once on probe time since pci_bridge_d3_possible() is called every time pci devices are added/removed so then it may get reset to 1 again.

What I think is necessary here and what I hope will be acceptable to Bjorn, is for platform code to be able to register a callback to be called from pci_bridge_d3_possible() which can veto the decision to use d3. This way we don't pollute the PCI core with this, while still allowing platform specific tweaks.

If we make this a sorted list of callbacks (allowing to specify a priority at register time)
instead of just 1 callback the the 2015 BIOS date check could be move to arch/x86 and the DMI blacklist can probably also be moved there.

And the platform_pci_bridge_d3() check can then also be a callback registered by the ACPI code.

Regards,

Hans