Re: [PATCH 2/2] PCI: vmd: Enable ASPM for mobile platforms

From: Kai-Heng Feng
Date: Thu Oct 08 2020 - 00:19:47 EST




> On Oct 7, 2020, at 21:30, Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote:
>
> On Wed, Oct 07, 2020 at 12:26:19PM +0800, Kai-Heng Feng wrote:
>>> On Oct 6, 2020, at 03:19, Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote:
>>> On Tue, Oct 06, 2020 at 02:40:32AM +0800, Kai-Heng Feng wrote:
>>>>> On Oct 3, 2020, at 06:18, Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote:
>>>>> On Wed, Sep 30, 2020 at 04:24:54PM +0800, Kai-Heng Feng wrote:
>
> ...
>>>> I wonder whether other devices that add PCIe domain have the same
>>>> behavior? Maybe it's not a special case at all...
>>>
>>> What other devices are these?
>>
>> Controllers which add PCIe domain.
>
> I was looking for specific examples, not just a restatement of what
> you said before. I'm just curious because there are a lot of
> controllers I'm not familiar with, and I can't think of an example.
>
>>>> I understand the end goal is to keep consistency for the entire ASPM
>>>> logic. However I can't think of any possible solution right now.
>>>>
>>>>> - If we built with CONFIG_PCIEASPM_POWERSAVE=y, would that solve the
>>>>> SoC power state problem?
>>>>
>>>> Yes.
>>>>
>>>>> - What issues would CONFIG_PCIEASPM_POWERSAVE=y introduce?
>>>>
>>>> This will break many systems, at least for the 1st Gen Ryzen
>>>> desktops and laptops.
>>>>
>>>> All PCIe ASPM are not enabled by BIOS, and those systems immediately
>>>> freeze once ASPM is enabled.
>>>
>>> That indicates a defect in the Linux ASPM code. We should fix that.
>>> It should be safe to use CONFIG_PCIEASPM_POWERSAVE=y on every system.
>>
>> On those systems ASPM are also not enabled on Windows. So I think
>> ASPM are disabled for a reason.
>
> If the platform knows ASPM needs to be disabled, it should be using
> ACPI_FADT_NO_ASPM or _OSC to prevent the OS from using it. And if
> CONFIG_PCIEASPM_POWERSAVE=y means Linux enables ASPM when it
> shouldn't, that's a Linux bug that we need to fix.

Yes that's a bug which fixed by Ian's new patch.

>
>>> Are there bug reports for these? The info we would need to start with
>>> includes "lspci -vv" and dmesg log (with CONFIG_PCIEASPM_DEFAULT=y).
>>> If a console log with CONFIG_PCIEASPM_POWERSAVE=y is available, that
>>> might be interesting, too. We'll likely need to add some
>>> instrumentation and do some experimentation, but in principle, this
>>> should be fixable.
>>
>> Doing this is asking users to use hardware settings that ODM/OEM
>> never tested, and I think the risk is really high.
>
> What? That's not what I said at all. I'm asking for information
> about these hangs so we can fix them. I'm not suggesting that you
> should switch to CONFIG_PCIEASPM_POWERSAVE=y for the distro.

Ah, I thought your suggestion is switching to CONFIG_PCIEASPM_POWERSAVE=y, because I sense you want to use that to cover the VMD ASPM this patch tries to solve.

Do we have a conclusion how to enable VMD ASPM with CONFIG_PCIEASPM_DEFAULT=y?

>
> Let's back up. You said:
>
> CONFIG_PCIEASPM_POWERSAVE=y ... will break many systems, at least
> for the 1st Gen Ryzen desktops and laptops.
>
> All PCIe ASPM are not enabled by BIOS, and those systems immediately
> freeze once ASPM is enabled.
>
> These system hangs might be caused by (1) some hardware issue that
> causes a hang when ASPM is enabled even if it is configured correctly
> or (2) Linux configuring ASPM incorrectly.

It's (2) here.

>
> For case (1), the platform should be using ACPI_FADT_NO_ASPM or _OSC
> to prevent the OS from enabling ASPM. Linux should pay attention to
> that even when CONFIG_PCIEASPM_POWERSAVE=y.
>
> If the platform *should* use these mechanisms but doesn't, the
> solution is a quirk, not the folklore that "we can't use
> CONFIG_PCIEASPM_POWERSAVE=y because it breaks some systems."

The platform in question doesn't prevent OS from enabling ASPM.

>
> For case (2), we should fix Linux so it configures ASPM correctly.
>
> We cannot use the build-time CONFIG_PCIEASPM settings to avoid these
> hangs. We need to fix the Linux run-time code so the system operates
> correctly no matter what CONFIG_PCIEASPM setting is used.
>
> We have sysfs knobs to control ASPM (see 72ea91afbfb0 ("PCI/ASPM: Add
> sysfs attributes for controlling ASPM link states")). They can do the
> same thing at run-time as CONFIG_PCIEASPM_POWERSAVE=y does at
> build-time. If those knobs cause hangs on 1st Gen Ryzen systems, we
> need to fix that.

Ian's patch solves the issue, at least for the systems I have.

Kai-Heng

>
> Bjorn