Re: [PATCH v5] PCI/ACPI: PCI/ACPI: Validate devices with power resources support D3

From: Limonciello, Mario
Date: Fri Nov 11 2022 - 13:58:45 EST


On 11/11/2022 11:41, Bjorn Helgaas wrote:
On Mon, Oct 31, 2022 at 05:33:55PM -0500, Mario Limonciello wrote:
Firmware typically advertises that ACPI devices that represent PCIe
devices can support D3 by a combination of the value returned by
_S0W as well as the HotPlugSupportInD3 _DSD [1].

`acpi_pci_bridge_d3` looks for this combination but also contains
an assumption that if an ACPI device contains power resources the PCIe
device it's associated with can support D3. This was introduced
from commit c6e331312ebf ("PCI/ACPI: Whitelist hotplug ports for
D3 if power managed by ACPI").

Some firmware configurations for "AMD Pink Sardine" do not support
wake from D3 in _S0W for the ACPI device representing the PCIe root
port used for tunneling. The PCIe device will still be opted into
runtime PM in the kernel [2] because of the logic within
`acpi_pci_bridge_d3`. This currently happens because the ACPI
device contains power resources.

When the thunderbolt driver is loaded two device links are created:
* USB4 router <- PCIe root port for tunneling
* USB4 router <- XHCI PCIe device

These device links are created because the ACPI devices declare the
`usb4-host-interface` _DSD [3]. For both links the USB4 router is the
supplier and these other devices are the consumers.
Here is a demonstration of this topology that occurs:

|
├─ 00:03.1
| | "PCIe root port used for tunneling"
| | ACPI Path: \_SB_.PCI0.GP11
| | ACPI Power Resources: Yes

I guess this means it has _PR0 and/or _PS0? (same for devices below)

Yeah.


| | ACPI _S0W return value: 0
| | Device Links: supplier:pci:0000:c4:00.5
| └─ PCIe Power state: D0

What bus does 00:03.1 lead to? I guess it doesn't lead to any of the
devices below?

By default - nothing. The topology was drawn correctly.

When you plug in a USB4 device with PCIe the newly enumerated devices hang off this bus.


└─ 00:08.3
| ACPI Path: \_SB_.PCI0.GP19
├─ PCIe Power state: D0

I guess 00:08.3 is a Root Port leading to bus c4?


Right.

├─ c4:00.3
| | "XHCI PCIe device used for tunneling"
| | ACPI Path: \_SB_.PCI0.GP19.XHC3
| | ACPI Power Resources: Yes
| | ACPI _S0W return value: 4
| | Device Links: supplier:pci:0000:c4:00.5
| └─ PCIe Power state: D3cold
└─ c4:00.5
| "USB4 Router"
| ACPI Path: \_SB_.PCI0.GP19.NHI0
| ACPI Power Resources: Yes
| ACPI _S0W return value: 4
| Device Links: consumer:pci:0000:00:03.1 consumer:pci:0000:c4:00.3
└─ PCIe Power state: D3cold

I'm reading the excellent Documentation/driver-api/device_link.rst and
trying to match up with this topology. This is a case where 00:03.1
is a consumer of c4:00.5. The driver core automatically enforces that
children are suspended before parents, but since 00:03.1 is an aunt
(not a child) of c4:00.5, the device link enforces the requirement
that 00:03.1 be suspended before c4:00.5. Right?

Yup!


Is c4:00.5 an NHI?

Yes.


The PCI power states shown above are the failure case? c4:00.5
*should* remain in D0 as long as 00:03.1 is in D0, but was instead put
in D3cold?

Yes.


Currently runtime PM is allowed for all of these devices.

This is because they all have _PR0 and/or _PS0? (Diagram doesn't show
that for 00:08.3, but I assume it must?)

For the root port used for tunneling it's because pci_bridge_d3_possible() returns TRUE.
This returns true because platform_pci_bridge(d3) return TRUE.

For the NHI it's because thunderbolt.ko sets this for all NHIs (see drivers/thunderbolt/nhi.c).


And I guess they all must have dev->is_hotplug_bridge set?

The PCIe root port for tunneling does; yes.


This means that
when all consumers are idle long enough, they will runtime suspend to
enter their deepest allowed sleep state. Once all consumers are in their
deepest allowed sleep state the suppliers will enter the deepest sleep
state as well.

* The PCIe root port for tunneling doesn't support waking from D3hot or
D3cold so although runtime suspended it stays in D0.
* The XHCI PCIe device supports wakeup from D3cold so it runtime suspends
to D3cold.
* Both consumers are in their deepest state and the USB4 router supports
wakeup from D3cold, so it runtime suspends into this state.

At least for AMD's case the hardware designer's expectation is the USB4
router should have also remained in D0 since the PCIe root port for
tunneling remained in D0 and a device link exists between the two devices.
The current Linux behavior is undefined.

Is the requirement that the supplier can never be in a lower-power
state than the consumer?

I guess that's *not* actually a requirement even though that's the
effect of this patch in this situation. If it *were* that simple, we
would just check the supplier and consumer power states instead of
looking at all these ACPI properties.

Yeah I think that's too broad of a generalization.

I don't know how realistic this is but for example what if the supplier supported D3 but the consumer supported D2?


Instead of making the assertion that the device can support D3 (and thus
runtime PM) solely from the presence of ACPI power resources, move the check
to later on in the function, which will have validated that the ACPI device
supports wake from D3hot or D3cold.

This fix prevents the USB4 router being put into D3 when the firmware
says that ACPI device representing the PCIe root port for tunneling can't
handle it while still allowing ACPI devices that don't have the
HotplugSupportInD3 _DSD to also enter D3 if they have power resources that
can wake from D3.

So I guess this changes what acpi_pci_bridge_d3() returns for 00:03.1?
Previously it returned "true" because it has _PR0/_PS0 (so
acpi_pci_power_manageable() is true), but now it will return "false"
because _S0W says it can only wake from D0?


Exactly.