Re: [PATCH] PCI/ASPM: Should not report ASPM support to BIOS if FADT indicates ASPM is unsupported

From: Manyi Li
Date: Mon Aug 15 2022 - 05:03:18 EST




在 2022/7/15 22:07, Rafael J. Wysocki 写道:
On Fri, Jul 15, 2022 at 2:24 PM Rafael J. Wysocki <rafael@xxxxxxxxxx> wrote:

On Fri, Jul 15, 2022 at 9:40 AM Manyi Li <limanyi@xxxxxxxxxxxxx> wrote:



On 2022/7/14 11:20, Kai-Heng Feng wrote:
[+Cc Matthew]

On Thu, Jul 14, 2022 at 2:28 AM Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote:

[+cc Kai-Heng, Vidya, who also have ASPM patches in flight]

On Wed, Jul 13, 2022 at 07:26:12PM +0800, Manyi Li wrote:
Startup log of ASUSTeK X456UJ Notebook show:
[ 0.130563] ACPI FADT declares the system doesn't support PCIe ASPM, so disable it
[ 48.092472] pcieport 0000:00:1c.5: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
[ 48.092479] pcieport 0000:00:1c.5: device [8086:9d15] error status/mask=00000001/00002000
[ 48.092481] pcieport 0000:00:1c.5: [ 0] RxErr
[ 48.092490] pcieport 0000:00:1c.5: AER: Corrected error received: 0000:00:1c.5
[ 48.092504] pcieport 0000:00:1c.5: AER: can't find device of ID00e5
[ 48.092506] pcieport 0000:00:1c.5: AER: Corrected error received: 0000:00:1c.5

Can you elaborate on the connection between the FADT ASPM bit and the
AER logs above?

Sorry,I don't know about that.


What problem are we solving here? A single corrected error being
logged? An infinite stream of errors? A device that doesn't work at
all?

Agree, what's the real symptom of the issue?

Please see the details of this issus:
https://bugzilla.kernel.org/show_bug.cgi?id=216245



We don't need the dmesg timestamps unless they contribute to
understanding the problem. I don't think they do in this case.

According to commit 387d37577fdd ("PCI: Don't clear ASPM bits when the
FADT declares it's unsupported"), the bit means "just use the ASPM
bits handed over by BIOS".

However, I do wonder why both drivers/pci/pci-acpi.c and
drivers/acpi/pci_root.c are doing the ACPI_FADT_NO_ASPM check,

Because pci_root.c doesn't read aspm_disabled.

I've recalled a bit in the meantime.

First off, ACPI_FADT_NO_ASPM forbids the OS from enabling ASPM control
(quite literally). It doesn't mean that the OS should not enumerate
ASPM and it doesn't mean that it should not report ASPM support to the
firmware via _OSC. Moreover, there are (or at least there were)
systems where the firmware expected ASPM support to be reported via
_OSC anyway (see commit 8b8bae901ce2 PCI/ACPI: Report ASPM support to
BIOS if not disabled from command line).

Thus, if ASPM is not disabled from command line, it would be
consistent to carry out the _OSC negotiation as usual regardless of
ACPI_FADT_NO_ASPM and then handle the case in which it is set in the
same way as the case in which the firmware doesn't grant the kernel
control of some PCIe features. Does this sound reasonable

This sound reasonable.


If it does, I think that ASPM should be enumerated regardless of
ACPI_FADT_NO_ASPM, but we need to ensure that its configuration is not
changed in any way if ACPI_FADT_NO_ASPM is set and I'm not sure if
that is the case now.

Of course, the same needs to happen when the kernel doesn't get full
control over PCIe features via _OSC, but AFAICS that case is handled
in the same way as the above already.

maybe one of them should be removed?

Arguably, pci_root.c could look at aspm_disabled instead of looking at
the FADT flag directly.

Second, if the former does sound reasonable, I'd rather avoid setting
aspm_disabled from drivers/pci/pci-acpi.c upfront when
ACPI_FADT_NO_ASPM is set, because doing that is not consistent with
the above.

Now, there may be BIOSes that don't expect to be informed of the OS
support for ASPM via _OSC if ACPI_FADT_NO_ASPM is set, and the
question is what to do with them. They clearly are at odds with the
BIOSes that do expect that to happen (mentioned above), so honestly
I'm not sure.

I'm not sure my issues is caused by report ASPM support to the firmware via _OSC.My issues is the same as this link:
https://groups.google.com/g/fa.linux.kernel/c/0uz8Nr_NVOI

Links to other discussions on this issue:
https://lore.kernel.org/all/20151229155822.GA17321@localhost/T/#u
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1521173


I think duplicate work has been done, but comment
in drivers/acpi/pci_root.c is
* We want to disable ASPM here, but aspm_disabled
* needs to remain in its state from boot so that we
* properly handle PCIe 1.1 devices. So we set this
* flag here, to defer the action until after the ACPI
* root scan.

I don't understand this logic.

This is about the case after failing acpi_pci_osc_control_set() and
generally we need to defer setting aspm_disabled because of
pcie_aspm_sanity_check().



Signed-off-by: Manyi Li <limanyi@xxxxxxxxxxxxx>
---
drivers/pci/pcie/aspm.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
index a96b7424c9bc..b173d3c75ae7 100644
--- a/drivers/pci/pcie/aspm.c
+++ b/drivers/pci/pcie/aspm.c
@@ -1359,6 +1359,7 @@ void pcie_no_aspm(void)
if (!aspm_force) {
aspm_policy = POLICY_DEFAULT;
aspm_disabled = 1;
+ aspm_support_enabled = false;

This makes pcie_no_aspm() work the same as booting with
"pcie_aspm=off". That might be reasonable.

I do wonder why we need both "aspm_disabled" and
"aspm_support_enabled". And I wonder why we need to set "aspm_policy"
when we're disabling ASPM. But those aren't really connected to your
change here.

From what I can understand "aspm_disabled" means "don't touch ASPM
left by BIOS", and "aspm_support_enabled" means "whether ASPM is
disabled via command line".
There seems to be some overlaps though.

According to commit 8b8bae901ce23 ("PCI/ACPI: Report ASPM support to
BIOS if not disabled from command line"), "aspm_support_enabled" means
whether or not report ASPM support to the BIOS through _OSC.

Right.


--
Manyi Li