Re: [PATCH v2] x86/pci: Stop requiring ECAM to be declared in E820, ACPI or EFI

From: Bjorn Helgaas
Date: Mon Feb 12 2024 - 17:31:08 EST


[+cc Ivan in case there's opportunity to improve FWTS]

On Wed, Jan 17, 2024 at 11:53:50AM -0600, Mario Limonciello wrote:
> On 12/15/2023 16:03, Mario Limonciello wrote:
> > commit 7752d5cfe3d1 ("x86: validate against acpi motherboard resources")
> > introduced checks for ensuring that MCFG table also has memory region
> > reservations to ensure no conflicts were introduced from a buggy BIOS.
> >
> > This has proceeded over time to add other types of reservation checks
> > for ACPI PNP resources and EFI MMIO memory type. The PCI firmware spec
> > does say that these checks are only required when the operating system
> > doesn't comprehend the firmware region:
> >
> > ```
> > If the operating system does not natively comprehend reserving the MMCFG
> > region, the MMCFG region must be reserved by firmware. The address range
> > reported in the MCFG table or by _CBA method (see Section 4.1.3) must be
> > reserved by declaring a motherboard resource. For most systems, the
> > motherboard resource would appear at the root of the ACPI namespace
> > (under \_SB) in a node with a _HID of EISAID (PNP0C02), and the resources
> > in this case should not be claimed in the root PCI bus’s _CRS. The
> > resources can optionally be returned in Int15 E820h or EFIGetMemoryMap
> > as reserved memory but must always be reported through ACPI as a
> > motherboard resource.
> > ```
> >
> > Running this check causes problems with accessing extended PCI
> > configuration space on OEM laptops that don't specify the region in PNP
> > resources or in the EFI memory map. That later manifests as problems with
> > dGPU and accessing resizable BAR. Similar problems don't exist in Windows
> > 11 with exact same laptop/firmware stack.
> >
> > Due to the stability of the Windows ecosystem that x86 machines participate
> > it is unlikely that using the region specified in the MCFG table as
> > a reservation will cause a problem. The possible worst circumstance could
> > be that a buggy BIOS causes a larger hole in the memory map that is
> > unusable for devices than intended.
> >
> > Change the default behavior to keep the region specified in MCFG even if
> > it's not specified in another source. This is expected to improve
> > machines that otherwise couldn't access PCI extended configuration space.
> >
> > In case this change causes problems, add a kernel command line parameter
> > that can restore the previous behavior.
> >
> > Link: https://members.pcisig.com/wg/PCI-SIG/document/15350
> > PCI Firmware Specification 3.3
> > Section 4.1.2 MCFG Table Description Note 2
> > Signed-off-by: Mario Limonciello <mario.limonciello@xxxxxxx>
> > ---
>
> Bjorn,
>
> Any thoughts on this version since our last conversation on V1?

I really want to clarify the dmesg logging such that it's clear that
PNP0C02 reservation is the only valid way to reserve the space
described by MCFG. Obviously we have to retain the fallbacks, but I
think there should be FW_BUG logging in that case. We currently only
do FW_INFO for missing PNP0C02 reservations.

I think we should try to change FWTS so it validates MCFG addresses
against the PNP0C02 reservations required by spec, instead of
searching E820 for them. The spec doesn't require MCFG regions to be
in E820, and I think searching there encourages the wrong behavior.
It probably also doesn't work at all on arm64, since it doesn't have
E820 at all.

The /sys/devices/pnp0/00:xx/resources files and "system 00:xx: [mem
..] has been reserved" lines in dmesg would be much better places to
check.

Bjorn