Re: [PATCH v11 9/9] PCI: ACPI: Use device constraints to decide PCI target state fallback policy

From: Bjorn Helgaas
Date: Wed Aug 16 2023 - 18:39:28 EST


[I see that you just posted a v12 that doesn't touch drivers/pci at
all. I haven't looked at it yet, so maybe my questions/comments below
are no longer relevant.]

On Wed, Aug 16, 2023 at 07:57:52AM -0500, Limonciello, Mario wrote:
> On 8/15/2023 6:48 PM, Bjorn Helgaas wrote:
> > On Wed, Aug 09, 2023 at 01:54:53PM -0500, Mario Limonciello wrote:
> > > Since commit 9d26d3a8f1b0 ("PCI: Put PCIe ports into D3 during suspend")
> > > PCIe ports from modern machines (>=2015) are allowed to be put into D3 by
> > > storing a value to the `bridge_d3` variable in the `struct pci_dev`
> > > structure.
> > >
> > > pci_power_manageable() uses this variable to indicate a PCIe port can
> > > enter D3.
> > > pci_pm_suspend_noirq() uses the return from pci_power_manageable() to
> > > decide whether to try to put a device into its target state for a sleep
> > > cycle via pci_prepare_to_sleep().
> > >
> > > For devices that support D3, the target state is selected by this policy:
> > > 1. If platform_pci_power_manageable():
> > > Use platform_pci_choose_state()
> > > 2. If the device is armed for wakeup:
> > > Select the deepest D-state that supports a PME.
> > > 3. Else:
> > > Use D3hot.
> > >
> > > Devices are considered power manageable by the platform when they have
> > > one or more objects described in the table in section 7.3 of the ACPI 6.5
> > > specification.
> > >
> > > When devices are not considered power manageable; specs are ambiguous as
> > > to what should happen. In this situation Windows 11 leaves PCIe
> > > ports in D0 while Linux puts them into D3 due to the above mentioned
> > > commit.
> >
> > Why would we not use the same policy as Windows 11?
>
> That's what I'm aiming to do with my patch series.

OK, help me out because I think I have a hint of how this works, but
I'm still really confused. Here's the sort of commit log I envision
(but I know it's all wrong, so help me out):

Iain reported that the Lenovo Thinkpad Z13 suspends but activity on
plugged-in USB devices will not cause a resume. The resume failed
because the Root Port leading to the USB adapter was in D3hot, and
... maybe firmware can't identify the wakeup source? I dunno?

The Z13 supplies an ACPI PNP0D80 System Power Management Controller
with a "Get Device Constraints" _DSM method that says the Root Port
should not be put into a lower power state than D0 ??

[This is one thing that seems completely wrong. The _DSM spec says
the device must be in the specified or DEEPER D-state to achieve the
lowest power platform idle state. That seems backwards.]

For power-manageable devices (those with _PS0 or _PR0, per ACPI
r6.5, sec 7.3), platform_pci_choose_state() selects the low-power
state based on the device's _SxW, _SxD, and _PRW methods.

For non-power manageable devices (those without _PS0 or _PR0), ACPI
doesn't help decide the low-power state.

[This confuses me too. Even here pci_set_power_state() uses
pci_platform_power_transition(), i.e., ACPI, but I guess this only
uses _OFF/_ON and doesn't count as "power managed"?

Maybe that makes sense since _OFF/_ON are part of sec 7.2 ("Declaring
a Power Resource Object", which is separate from *7.3* ("Device Power
Management Objects") where _PR0, _PS0, _PRW, _SxD, _SxW, etc are.]

However, extensions like the ACPI PNP0D80 System Power Management
Controller device can provide constraints on the allowable low-power
states.

Add platform_get_constraint() so pci_target_state() can discover
constraints from these extensions and avoid choosing a deeper
low-power state than the platform allows.

[Again, this *seems* backwards to how that _DSM is documented.]

> > > In Windows systems that support Modern Standby specify hardware
> > > pre-conditions for the SoC to achieve the lowest power state by device
> > > constraints in a SOC specific "Power Engine Plugin" (PEP) [2] [3].
> > > They can be marked as disabled or enabled and when enabled can specify
> > > the minimum power state required for an ACPI device.
> > >
> > > When it is ambiguous what should happen, adjust the logic for
> > > pci_target_state() to check whether a device constraint is present
> > > and enabled.
> > > * If power manageable by ACPI use this to get to select target state
> > > * If a device constraint is present but disabled then choose D0
> > > * If a device constraint is present and enabled then use it
> > > * If a device constraint is not present, then continue to existing
> > > logic (if marked for wakeup use deepest state that PME works)
> > > * If not marked for wakeup choose D3hot
> >
> > Apparently this is based on constraints from the _DSM method of a
> > PNP0D80 device (and Intel x86 and AMD have different _DSM methods)?
> >
> > Isn't this something that needs to be part of the ACPI spec? I
> > don't see PNP0D80 mentioned there.
>
> So PNP0D80 is a "Windows-compatible system power management controller"
> See https://learn.microsoft.com/en-us/windows-hardware/drivers/bringup/device-management-namespace-objects#compatible-id-_cid
>
> > Obviously we also have ARM64 and other arches now using ACPI
>
> Let me quote a few select sections of [4] that I want to draw attention to:
>
> "Devices that are integrated into the SoC can be power-managed through the
> Windows Power Framework (PoFx). These framework-integrated devices are
> power-managed by PoFx through a SoC-specific power engine plug-in (microPEP)
> that knows the specifics of the SoC's power and clock controls.
> ...
> For peripheral devices that aren't integrated into the SoC, Windows uses
> ACPI Device Power Management.
> ...
> It is possible to, and some device stacks do, use ACPI Device Power
> Management alone, or in combination with the microPEP for on-SoC device
> power management.
> "
>
> [4] https://learn.microsoft.com/en-us/windows-hardware/drivers/bringup/device-power-management#device-power-management-in-windows
>
> In Linux we call the device that supports PNP0D80 the "LPS0 device", but in
> Windows they call it uPEP. In Windows the SOC vendor provides this driver
> and it supports the _DSM calls for the vendor.
>
> As part of going into Modern Standby on Windows that uPEP driver
> orchestrates the state of the devices mentioned by constraints.
>
> I'd like to think that's exactly what this patch is doing; it's giving the
> ability to select the target state for SOC specified constraints to the LPS0
> driver.
>
> Looking at the intertwined function calls again, I wonder if maybe it's
> better to move the constraint checking into pci_prepare_to_sleep().
>
> That makes it explicitly clear that LPS0 constraints shouldn't be used for
> anything other than suspend rather than also having implications into
> runtime PM.
>
> As for your comment for ARM64, I think if they use Windows uPEP constraints
> we can add an PNP0D80 compatible LPS0 driver for them too just the same.
>
> > so I'm not really comfortable with these bits scrounged from
> > Microsoft and Intel white papers. That seems hard to maintain in
> > the future, and this is hard enough now.
>
> Remember this all comes back to a PCI root port in the SOC being put
> into an unexpected D-state over suspend. The device is not ACPI
> power manageable so *that behavior* is up to the OSPM and is not
> grounded in any ACPI or PCI spec.
>
> TBH I think the Microsoft documentation is the best we're going to
> get here for this case. To be compatible with UEFI firmware
> designed for Windows machines you need to adopt some Windows-isms.
>
> If this was coreboot, I would tell the coreboot developers to mark
> this root port as ACPI power manageable and adjust the _SxD and _SxW
> objects so that this root port couldn't get into D3.
>
> > > Link: https://uefi.org/specs/ACPI/6.5/07_Power_and_Performance_Mgmt.html#device-power-management-objects [1]
> > > Link: https://learn.microsoft.com/en-us/windows-hardware/design/device-experiences/platform-design-for-modern-standby#low-power-core-silicon-cpu-soc-dram [2]
> > > Link: https://uefi.org/sites/default/files/resources/Intel_ACPI_Low_Power_S0_Idle.pdf [3]

This covers the Intel PNP0D80 _DSM. But
lpi_device_get_constraints_amd() implies that there's a similar but
different AMD version? Is there a published spec for the AMD _DSM?

> > > Fixes: 9d26d3a8f1b0 ("PCI: Put PCIe ports into D3 during suspend")
> > > Reported-by: Iain Lane <iain@xxxxxxxxxxxxxxxxxxx>
> > > Closes: https://forums.lenovo.com/t5/Ubuntu/Z13-can-t-resume-from-suspend-with-external-USB-keyboard/m-p/5217121
> > > Signed-off-by: Mario Limonciello <mario.limonciello@xxxxxxx>
> > > ---
> > > v10->v11:
> > > * Fix kernel kernel build robot errors and various sparse warnings
> > > related to new handling of pci_power_t.
> > > * Use the new helpers introduced in previous patches
> > > ---
> > > drivers/pci/pci-acpi.c | 12 ++++++++++++
> > > drivers/pci/pci.c | 17 ++++++++++++++++-
> > > drivers/pci/pci.h | 5 +++++
> > > 3 files changed, 33 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/pci/pci-acpi.c b/drivers/pci/pci-acpi.c
> > > index b5b65cdfa3b8b..9f418ec87b6a5 100644
> > > --- a/drivers/pci/pci-acpi.c
> > > +++ b/drivers/pci/pci-acpi.c
> > > @@ -1081,6 +1081,18 @@ bool acpi_pci_bridge_d3(struct pci_dev *dev)
> > > return false;
> > > }
> > > +/**
> > > + * acpi_pci_device_constraint - return any PCI power state constraints
> > > + * @dev: PCI device to check
> > > + *
> > > + * Returns: Any valid constraint specified by platform for a device.
> > > + * Otherwise PCI_POWER_ERROR.
> > > + */
> > > +pci_power_t acpi_pci_device_constraint(struct pci_dev *dev)
> > > +{
> > > + return map_acpi_to_pci(acpi_get_lps0_constraint(&dev->dev));

The non-x86 acpi_get_lps0_constraint() stub returns -ENODEV, which
doesn't seem quite right because map_acpi_to_pci() assumes it gets
ACPI_STATE_D0, etc.

I think it happens to work out fine because if it gets something
that's not ACPI_STATE_*, map_acpi_to_pci() returns PCI_POWER_ERROR,
but that's unholy knowledge of the ACPI_STATE_* values.

> > > +}
> > > +
> > > static void acpi_pci_config_space_access(struct pci_dev *dev, bool enable)
> > > {
> > > int val = enable ? ACPI_REG_CONNECT : ACPI_REG_DISCONNECT;
> > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > > index 693f4ca90452b..567443726974b 100644
> > > --- a/drivers/pci/pci.c
> > > +++ b/drivers/pci/pci.c
> > > @@ -1082,6 +1082,14 @@ static inline bool platform_pci_bridge_d3(struct pci_dev *dev)
> > > return acpi_pci_bridge_d3(dev);
> > > }
> > > +static inline pci_power_t platform_get_constraint(struct pci_dev *dev)
> > > +{
> > > + if (pci_use_mid_pm())
> > > + return PCI_POWER_ERROR;
> > > +
> > > + return acpi_pci_device_constraint(dev);
> > > +}
> > > +
> > > /**
> > > * pci_update_current_state - Read power state of given device and cache it
> > > * @dev: PCI device to handle.
> > > @@ -2685,11 +2693,13 @@ static inline pci_power_t pci_get_wake_pme_state(struct pci_dev *dev)
> > > */
> > > static pci_power_t pci_target_state(struct pci_dev *dev, bool wakeup)
> > > {
> > > + pci_power_t state;
> > > +
> > > if (platform_pci_power_manageable(dev)) {
> > > /*
> > > * Call the platform to find the target state for the device.
> > > */
> > > - pci_power_t state = platform_pci_choose_state(dev);
> > > + state = platform_pci_choose_state(dev);
> > > switch (state) {
> > > case PCI_POWER_ERROR:
> > > @@ -2715,6 +2725,11 @@ static pci_power_t pci_target_state(struct pci_dev *dev, bool wakeup)
> > > else if (!dev->pm_cap)
> > > return PCI_D0;
> > > + /* if platform indicates preferred state device constraint, use it */
> > > + state = platform_get_constraint(dev);
> > > + if (state != PCI_POWER_ERROR)
> > > + return state;
> > > +
> > > if (wakeup && dev->pme_support)
> > > return pci_get_wake_pme_state(dev);
> > > diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> > > index a4c3974340576..410fca4b88837 100644
> > > --- a/drivers/pci/pci.h
> > > +++ b/drivers/pci/pci.h
> > > @@ -707,6 +707,7 @@ void pci_set_acpi_fwnode(struct pci_dev *dev);
> > > int pci_dev_acpi_reset(struct pci_dev *dev, bool probe);
> > > bool acpi_pci_power_manageable(struct pci_dev *dev);
> > > bool acpi_pci_bridge_d3(struct pci_dev *dev);
> > > +pci_power_t acpi_pci_device_constraint(struct pci_dev *dev);
> > > int acpi_pci_set_power_state(struct pci_dev *dev, pci_power_t state);
> > > pci_power_t acpi_pci_get_power_state(struct pci_dev *dev);
> > > void acpi_pci_refresh_power_state(struct pci_dev *dev);
> > > @@ -731,6 +732,10 @@ static inline bool acpi_pci_bridge_d3(struct pci_dev *dev)
> > > {
> > > return false;
> > > }
> > > +static inline pci_power_t acpi_pci_device_constraint(struct pci_dev *dev)
> > > +{
> > > + return PCI_POWER_ERROR;
> > > +}
> > > static inline int acpi_pci_set_power_state(struct pci_dev *dev, pci_power_t state)
> > > {
> > > return -ENODEV;
> > > --
> > > 2.34.1
> > >