Re: [PATCH v2 1/6] PCI/ASPM: Add locked helper for enabling link state

From: Bjorn Helgaas
Date: Wed Dec 13 2023 - 15:45:18 EST


On Wed, Dec 13, 2023 at 11:48:41AM -0800, David E. Box wrote:
> On Tue, 2023-12-12 at 15:27 -0600, Bjorn Helgaas wrote:
> > On Tue, Dec 12, 2023 at 11:48:27AM +0800, Kai-Heng Feng wrote:
> > > On Fri, Dec 8, 2023 at 4:47 AM Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote:
> > > ...
> >
> > > > I hope we can obsolete this whole idea someday.  Using pci_walk_bus()
> > > > in qcom and vmd to enable ASPM is an ugly hack to work around this
> > > > weird idea that "the OS isn't allowed to enable more ASPM states than
> > > > the BIOS did because the BIOS might have left ASPM disabled because it
> > > > knows about hardware issues."  More history at
> > > > https://lore.kernel.org/linux-pci/20230615070421.1704133-1-kai.heng.feng@xxxxxxxxxxxxx/T/#u
> > > >
> > > > I think we need to get to a point where Linux enables all supported
> > > > ASPM features by default.  If we really think x86 BIOS assumes an
> > > > implicit contract that the OS will never enable ASPM more
> > > > aggressively, we might need some kind of arch quirk for that.
> > >
> > > The reality is that PC ODM toggles ASPM to workaround hardware
> > > defects, assuming that OS will honor what's set by the BIOS.
> > > If ASPM gets enabled for all devices, many devices will break.
> >
> > That's why I mentioned some kind of arch quirk.  Maybe we're forced to
> > do that for x86, for instance.  But even that is a stop-gap.
> >
> > The idea that the BIOS ASPM config is some kind of handoff protocol is
> > really unsupportable.
>
> To be clear, you are not talking about a situation where
> ACPI_FADT_NO_ASPM or _OSC PCIe disallow OS ASPM control, right?
> Everyone agrees that this should be honored? The question is what to
> do by default when the OS is not restricted by these mechanisms?

Exactly. The OS should respect ACPI_FADT_NO_ASPM and _OSC.

I think there are a couple exceptions where we want to disable ASPM
even if the platform said the OS shouldn't touch ASPM at all, but
that's a special case.

> Reading the mentioned thread, I too think that using the BIOS config
> as the default would be the safest option, but only to avoid
> breaking systems, not because of an implied contract between the
> BIOS and OS. However, enabling all capable ASPM features is the
> ideal option. If the OS isn't limited by ACPI_FADT_NO_ASPM or _OSC
> PCIe, then ASPM enabling is fully under its control. If this
> doesn't work for some devices then they are broken and need a quirk.

Agreed. It may not be practical to identify such devices, so we may
need a broader arch-based and/or date-based quirk.

I'd be shocked if Windows treated the BIOS config as a "do not exceed
this" situation, so my secret hope is that some of these "broken"
devices are really caused by defects in the Linux ASPM support or the
driver, and that we can fix them if we find out about them.

But I have no details about any of these alleged broken devices, so
it's hard to make progress on them. Maybe we should log a debug note
if the device advertises ASPM support that BIOS didn't enable.

Bjorn