Issues with ACPI _CRS and E820 memory map

From: Bjorn Helgaas
Date: Thu Aug 11 2022 - 14:30:42 EST


This is a heads-up about what I think is a firmware defect in the way
some platforms build _CRS methods. We've had a Linux workaround for
several years, but the workaround breaks some new machines, so the
workaround will be disabled for 2023 and newer machines.

Machines that depend on the workaround include:

- Dell Precision T3500
- Lenovo ThinkPad X1 Gen 2
- Asus C523NA (Coral) Chromebook
- Likely any machine using coreboot firmware

The current versions of the machines above work fine, but 2023
versions with similar firmware are likely to break unless the firmware
changes. Please forward this to any firmware folks who may be able to
help with this issue.

Bjorn


SUMMARY

A Linux change will break future platforms that rely on the E820 memory
map to exclude portions of the PCI host bridge windows reported by ACPI
_CRS methods.

Linux discovers PCI host bridge MMIO windows by evaluating the _CRS
method of the ACPI PNP0A03 device that describes the host bridge. It
uses these windows to assign address space to PCI BARs.

In some cases these _CRS methods are incomplete or incorrect, and it's
hard for an OS to work around this.

Below are examples of typical problems with _CRS methods.

PLATFORMS REPORT NON-WINDOW SPACE VIA _CRS

Sometimes _CRS includes host bridge register space or space assigned to
hidden PCI devices that are not enumerable by the OS. When an OS assigns
this space to PCI devices, it may cause conflicts or devices may not
work. This appears to be a firmware defect.

Many platforms report this non-window space as "reserved" in the E820
memory map, and since 2010, Linux has worked around the _CRS defect by
excluding these E820 "reserved" regions from the host bridge MMIO
windows [4].

Example 1:

_CRS includes space that's not usable for PCI devices [1]:

E820: [mem 0xdceff000-0xdfa0ffff] reserved
PNP0A08 _CRS: [mem 0xdfa00000-0xfebfffff]

Note that [mem 0xdfa00000-0xdfa0ffff] is included in both the E820
entry and _CRS.

If Linux assigns [mem 0xdfa00000-0xdfbfffff] to a PCI device, the
system doesn't resume correctly from suspend. If Linux avoids the
[mem 0xdfa00000-0xdfa0ffff] area and instead assigns
[mem 0xdfb00000-0xdfcfffff], resume works correctly.

Example 2:

_CRS includes space assigned to a "hidden" PCI device [2, 5]:

PCI: 00:0d.0 10 base d0000000 limit d0ffffff mem (fixed) # BIOS log

E820: [mem 0xd0000000-0xd0ffffff]
PNP0A08 _CRS: [mem 0x80000000-0xe0000000]

The 00:0d.0 device is assigned the [mem 0xd0000000-0xd0ffffff] space,
but the device is hidden so the OS cannot enumerate it, so the OS
doesn't know what space the device consumes.

PLATFORMS SUPPLY E820 ENTRIES COVERING ENTIRE _CRS WINDOWS

Some recent platforms supply E820 "reserved" regions that cover entire
PCI host bridge windows. If Linux excludes these E820 regions from the
windows, it cannot assign space to PCI BARs, which means hot-added
devices don't work.

Example 3:

E820 has a "reserved" region that completely covers the 32-bit MMIO
window from _CRS [3]:

E820: [mem 0x4bc50000-0xcfffffff] reserved
PNP0A08 _CRS: [mem 0x65400000-0xbfffffff]

Historically, Linux has avoided putting PCI devices in E820 reserved
regions to avoid the problems in examples 1 and 2. Avoiding those
regions in this case means Linux can't assign space for hot-added
devices, so they don't work.

LINUX PLANS

As far as I know, the ACPI spec does not require an OS to exclude space
from _CRS resources based on the E820 memory map, and these conflicting
requirements make it impractical for Linux to do so.

The "avoid E820 regions" workaround worked for several years, but it no
longer works because of platforms that advertise E820 regions that cover
*entire* _CRS windows.

We plan to make Linux stop excluding E820 regions from _CRS resources for
platforms with a BIOS date of 2023 or newer, so new platforms or new BIOS
releases that rely on excluding E820 regions may break [6].

Linux is likely to be broken on future versions of these platforms unless
the firmware updates _CRS methods.

If these platforms do not update _CRS methods to be complete and
accurate, Linux may not boot. The user's options are to:

- Manually boot with a kernel command line option like "pci=use_e820".

- Wait for an updated kernel with a platform-specific workaround.

WHY DOESN'T THIS AFFECT MICROSOFT WINDOWS?

Short answer: I suspect it *does*, but it's less likely to be a problem
on Windows.

As far as I know, Windows does not exclude MMIO space from _CRS based on
the E820 memory map. But Windows allocates PCI BARs from the top down,
while Linux allocates from the bottom up. Most of the issues happen with
space at the bottom of the _CRS MMIO windows, so Linux is more likely to
trip over them than Windows is.

REFERENCES

[1] https://bugzilla.redhat.com/show_bug.cgi?id=2029207
[2] https://lore.kernel.org/linux-pci/4e9fca2f-0af1-3684-6c97-4c35befd5019@xxxxxxxxxx/#t
[3] https://bugzilla.kernel.org/show_bug.cgi?id=206459
[4] https://bugzilla.kernel.org/show_bug.cgi?id=16228
[5] https://review.coreboot.org/plugins/gitiles/coreboot/+/dbcf7b16219d%5E%21/
[6] https://git.kernel.org/linus/0ae084d5a674