Re: [PATCH 2/2] x86/pci: Treat EfiMemoryMappedIO as reservation of ECAM space

From: Bjorn Helgaas
Date: Mon Nov 20 2023 - 11:29:58 EST


On Sat, Nov 18, 2023 at 03:21:43PM +0100, Tomasz Pala wrote:
> On Thu, Nov 09, 2023 at 12:44:05 -0600, Bjorn Helgaas wrote:
>
> >> https://bugzilla.kernel.org/show_bug.cgi?id=218050
> >>
> >> I think the problem is that the MMCONFIG region is at
> >> [mem 0x80000000-0x8fffffff], and that is *also* included in one of the
> >> host bridge windows reported via _CRS:
> >>
> >> PCI: MMCONFIG for domain 0000 [bus 00-ff] at [mem 0x80000000-0x8fffffff] (base 0x80000000)
> >> pci_bus 0000:00: root bus resource [mem 0x80000000-0xfbffffff window]
> >>
> >> I'll try to figure out how to deal with that. In the meantime, would
> >> you mind attaching the contents of /proc/iomem to the bugzilla? I
> >
> > I attached a debug patch to both bugzilla entries. If you could
> > attach the "acpidump" output and (if practical) boot a kernel with the
> > debug patch and attach the dmesg logs, that would be great.
>
> I've posted the files. There are signs of buggy BIOS, but I don't expect
> any firmware update to be released for this hw anymore.

Thank you! A BIOS update is almost never the answer because even if
an update exists, we have to assume that most users in the field will
never install the update.

I want to look at the BIOS info in case we can learn about something
*Linux* is doing wrong. This most likely works fine with Windows, so
I assume Linux is doing something wrong or at least differently than
Windows.

> DMI: Supermicro X9DRi-LN4+/X9DR3-LN4+/X9DRi-LN4+/X9DR3-LN4+, BIOS 3.4 11/20/2019
>
> .text .data .bss are not marked as E820_TYPE_RAM!

Added by 4eea6aa581ab ("x86, mm: if kernel .text .data .bss are not
marked as E820_RAM, complain and fix"). No idea. A shame we didn't
include the .text/.data values in the message.

> tboot: non-0 tboot_addr but it is not of type E820_TYPE_RESERVED

Added by 316253406959 ("x86, intel_txt: Intel TXT boot support"). No
idea about this either.

> DMAR: [Firmware Bug]: No firmware reserved region can cover this RMRR [0x00000000df243000-0x00000000df251fff], contact BIOS vendor for fixes
> DMAR: [Firmware Bug]: Your BIOS is broken; bad RMRR [0x00000000df243000-0x00000000df251fff]

Both related to arch_rmrr_sanity_check(), added by f036c7fa0ab6
("iommu/vt-d: Check VT-d RMRR region in BIOS is reported as reserved")
and f5a68bb0752e ("iommu/vt-d: Mark firmware tainted if RMRR fails
sanity check").

No idea about this one either. The VT-d spec (r1.3, sec 8.4) says
"BIOS must report the RMRR reported memory addresses as reserved in
the system memory map returned through methods such as INT15, EFI
GetMemoryMap etc."

arch_rmrr_sanity_check() only looks at your e820 map, which only has
this:

BIOS-e820: [mem 0x0000000000000000-0x000000000009ffff] usable
BIOS-e820: [mem 0x0000000000100000-0x00000000d1f36fff] usable

I think Linux basically converts the info from EFI GetMemoryMap
to an e820 format; I think booting with "efi=debug" would show more
details of this.

Anyway, this is all a tangent.

> BTW is there a reason for this logging discrepancy?
>
> efi: Remove mem173: MMIO range=[0xe0000000-0xefffffff] (256MB) from e820 map
> efi: Not removing mem71: MMIO range=[0xe0000000-0xefffffff] (262144KB) from e820 map
>
> efi: Not removing mem74: MMIO range=[0xff000000-0xffffffff] (16384KB) from e820 map
> efi: Remove mem176: MMIO range=[0xff000000-0xffffffff] (16MB) from e820 map
>
> This is arch/x86/platform/efi/efi.c:
> static void __init efi_remove_e820_mmio(void)
>
> Remove mem%02u: MMIO range=[0x%08llx-0x%08llx] (%lluMB) ... size >> 20
> Not removing mem%02u: MMIO range=[0x%08llx-0x%08llx] (%lluKB) ... size >> 10

You mean the MB vs KB difference? That's my fault. I guess I used KB
for the "Not removing" message because those are smaller (< 256KB) so
the size in MB wouldn't be useful there. We could use KB for both,
but I guess I used MB for the "Remove" case because it's a little
easier to read and I expected "Not removing" to be a relatively
unusual case.

Bjorn