Re: [5.17 regression] "x86/PCI: Ignore E820 reservations for bridge windows on newer systems" breaks suspend/resume

From: Hans de Goede
Date: Mon Feb 14 2022 - 07:42:42 EST


Hi,

On 2/10/22 07:39, Mika Westerberg wrote:
> Hi Hans,
>
> On Wed, Feb 09, 2022 at 05:08:13PM +0100, Hans de Goede wrote:
>> As mentioned in my email from 10 seconds ago I think a better simpler
>> fix would be to just do:
>>
>> diff --git a/arch/x86/kernel/resource.c b/arch/x86/kernel/resource.c
>> index 9b9fb7882c20..18656f823764 100644
>> --- a/arch/x86/kernel/resource.c
>> +++ b/arch/x86/kernel/resource.c
>> @@ -28,6 +28,10 @@ static void remove_e820_regions(struct resource *avail)
>> int i;
>> struct e820_entry *entry;
>>
>> + /* Only remove E820 reservations on classic BIOS boot */
>> + if (efi_enabled(EFI_MEMMAP))
>> + return;
>> +
>> for (i = 0; i < e820_table->nr_entries; i++) {
>> entry = &e820_table->entries[i];
>>
>>
>> I'm curious what you think of that?
>
> I'm not an expert in this e820 stuff but this one looks really simple
> and makes sense to me. So definitely should go with it assuming there
> are no objections from the x86 maintainers.

Unfortunately with this suspend/resume is still broken on the ThinkPad X1 carbon gen 2 of the reporter reporting the regression. The reporter has been kind enough to also test in EFI mode (at my request) and then the problem is back again with this patch. So just differentiating between EFI / non EFI mode is not an option.

FYI, here is what I believe is the root-cause of the issue on the ThinkPad X1 carbon gen 2:

The E820 reservations table has the following in both BIOS and EFI boot modes:

[ 0.000000] BIOS-e820: [mem 0x00000000dceff000-0x00000000dfa0ffff] reserved

Which has a small overlap with:

[ 0.884684] pci_bus 0000:00: root bus resource [mem 0xdfa00000-0xfebfffff window]

This leads to the following difference in assignments of PCI resources when honoring E820 reservations

[ 0.966573] pci 0000:00:1c.0: BAR 14: assigned [mem 0xdfb00000-0xdfcfffff]
[ 0.966698] pci_bus 0000:02: resource 1 [mem 0xdfb00000-0xdfcfffff]

vs the following when ignoring E820 reservations:

[ 0.966850] pci 0000:00:1c.0: BAR 14: assigned [mem 0xdfa00000-0xdfbfffff]
[ 0.966973] pci_bus 0000:02: resource 1 [mem 0xdfa00000-0xdfbfffff]

And the overlap of 0xdfa00000-0xdfa0ffff from the e820 reservations seems to be what is causing the suspend/resume issue.

###

As already somewhat discussed, I'll go and prepare this solution instead:

1. Add E820_TYPE_MMIO to enum e820_type and modify the 2 places which check for
type == reserved to treat this as reserved too, so as to not have any
functional changes there

2. Modify the code building e820 tables from the EFI memmap to use
E820_TYPE_MMIO for MMIO EFI memmap entries.

3. Modify arch/x86/kernel/resource.c: remove_e820_regions() to skip
e820 table entries with a type of E820_TYPE_MMIO,
this would actually be a functional change and should fix the
issues we are trying to fix.

Regards,

Hans