Re: [PATCH] PCI: Call _REG when saving/restoring PCI state

From: Limonciello, Mario
Date: Fri Jun 02 2023 - 17:57:40 EST



On 6/2/2023 3:21 PM, Bjorn Helgaas wrote:
[+cc Rafael, Len, linux-acpi]

Hi Mario,

On Thu, Jun 01, 2023 at 10:11:22PM -0500, Mario Limonciello wrote:
ASMedia PCIe GPIO controllers connected to AMD SOC fail functional tests
after returning from s2idle. This is because the BIOS checks whether the
OSPM has called the _REG method to determine whether it can interact with
the OperationRegion assigned to the device.
"s2idle" is a Linux term; I'd prefer something that we can relate to
the ACPI spec.
It's important for the symptoms of this issue though, this
problem doesn't trigger "just" by moving D-states.

It happens as a result of system suspend.


Maybe a pointer to the specific function in the driver that has a
problem? Based on the patch, I assume the driver uses some control
method that looks at PCI config space?

The issue isn't in anything Linux code "does"; it's in the "lack"
of Linux code doing what it needs to IE using _REG.

At least for this issue _REG is treated like a lock mechanism.
In the spec it says specifically:

"When an operation region handler is unavailable, AML cannot access
data fields in that region".

That is it's to ensure that OSPM and AML don't both simultaneously
access the same region.

What happens is that AML normally wants to access this region during
suspend, but without the sequence of calling _REG it can't.


To fix this issue, call acpi_evaluate_reg() when saving and restoring the
state of PCI devices.
Please include the spec citation: ACPI r6.5, sec 6.5.4. The URL has
changed in the past and may change in the future, but the name/section
number will not.
Sure.

Link: https://uefi.org/htmlspecs/ACPI_Spec_6_4_html/06_Device_Configuration/Device_Configuration.html#reg-region
Signed-off-by: Mario Limonciello <mario.limonciello@xxxxxxx>
---
drivers/pci/pci.c | 12 ++++++++++++
1 file changed, 12 insertions(+)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index e38c2f6eebd4..071ecba548b0 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -1068,6 +1068,12 @@ static inline bool platform_pci_bridge_d3(struct pci_dev *dev)
return acpi_pci_bridge_d3(dev);
}
+static inline int platform_toggle_reg(struct pci_dev *dev, int c)
+{
+ return acpi_evaluate_reg(ACPI_HANDLE(&dev->dev),
+ ACPI_ADR_SPACE_PCI_CONFIG, c);
+}
You never check the return value, so why return it?

_REG isn't mandatory for any of these uses, and I wanted to make
sure that if it does end up being mandatory in a future use that
the return code wasn't thrown away.  If you think it's better to
just throw it away now, I have no qualms making it a void instead.


The function actually doesn't *toggle*; it connects or disconnects
based on "c".
Can you suggest a better function name?

This looks like it only builds when CONFIG_ACPI=y?

The prototype for acpi_evaluate_reg isn't guarded by CONFIG_ACPI
so I figured it worked both ways.

But looking again I don't see a dummy implementation version for
the lack of CONFIG_ACPI, so I'll double check it.


/**
* pci_update_current_state - Read power state of given device and cache it
* @dev: PCI device to handle.
@@ -1645,6 +1651,9 @@ static void pci_restore_ltr_state(struct pci_dev *dev)
int pci_save_state(struct pci_dev *dev)
{
int i;
+
+ platform_toggle_reg(dev, ACPI_REG_DISCONNECT);
I would expect these to be in the PM code near the power state
transitions, not in the state save/restore code. These functions
*are* used during suspend/resume, but are used in other places as
well, where we probably don't want _REG executed.

Cc'd Rafael and PM folks, who can give much better feedback.
My knee jerk reaction when we found the root cause for this issue
was to put the code right around the D-state transitions, but I
decided against this.

I put it in save/restore intentionally because
like I mentioned above it's treated like a locking mechanism between
OSPM and AML and it's not functionally tied to a D-state transition.

When the state is saved it's like Linux says
"I'm done using this region, go ahead and touch it firmware".
When it's restored it's like Linux says
"Don't use that region, I'm claiming it for now".

Think about that other patch I wrote recently that controls D3
availability [1].  If it was only run in the D-state transitions and
the root port stays in D0 but has a _REG method it would never get
called.

/* XXX: 100% dword access ok here? */
for (i = 0; i < 16; i++) {
pci_read_config_dword(dev, i * 4, &dev->saved_config_space[i]);
@@ -1790,6 +1799,8 @@ void pci_restore_state(struct pci_dev *dev)
pci_enable_acs(dev);
pci_restore_iov_state(dev);
+ platform_toggle_reg(dev, ACPI_REG_CONNECT);
+
dev->state_saved = false;
}
EXPORT_SYMBOL(pci_restore_state);
@@ -3203,6 +3214,7 @@ void pci_pm_init(struct pci_dev *dev)
pci_read_config_word(dev, PCI_STATUS, &status);
if (status & PCI_STATUS_IMM_READY)
dev->imm_ready = 1;
+ platform_toggle_reg(dev, ACPI_REG_CONNECT);
}
static unsigned long pci_ea_flags(struct pci_dev *dev, u8 prop)
--
2.34.1
[1] https://lore.kernel.org/linux-pci/20230530163947.230418-2-mario.limonciello@xxxxxxx/