Re: [PATCH] x86/mm/ident_map: Use full gbpages in identity maps except on UV platform.

From: Eric W. Biederman
Date: Mon Mar 25 2024 - 12:48:49 EST


Russ Anderson <rja@xxxxxxx> writes:

> On Sun, Mar 24, 2024 at 11:31:39AM +0100, Ingo Molnar wrote:
>>
>> * Steve Wahl <steve.wahl@xxxxxxx> wrote:
>>
>> > Some systems have ACPI tables that don't include everything that needs
>> > to be mapped for a successful kexec. These systems rely on identity
>> > maps that include the full gigabyte surrounding any smaller region
>> > requested for kexec success. Without this, they fail to kexec and end
>> > up doing a full firmware reboot.
>> >
>> > So, reduce the use of GB pages only on systems where this is known to
>> > be necessary (specifically, UV systems).
>> >
>> > Signed-off-by: Steve Wahl <steve.wahl@xxxxxxx>
>> > Fixes: d794734c9bbf ("x86/mm/ident_map: Use gbpages only where full GB page should be mapped.")
>> > Reported-by: Pavin Joseph <me@xxxxxxxxxxxxxxx>
>>
>> Sigh, why was d794734c9bbf marked for a -stable backport? The commit
>> never explains ...
>
> I will try to explain, since Steve is offline. That commit fixes a
> legitimate bug where more address range is mapped (1G) than the
> requested address range. The fix avoids the issue of cpu speculativly
> loading beyond the requested range, which inludes specutalive loads
> from reserved memory. That is why it was marked for -stable.

To call that a bug presumes that the memory type range registers
were not setup properly by the boot firmware.

I think I saw something that the existence of memory type range
registers is changing/has changed in recent cpus, but historically it
has been the job of the memory type range registers to ensure that the
attributes of specific addresses are correct.

The memory attributes should guide the speculation.

To depend upon page tables to ensure the attributes are correct would
presumably require a cpu that does not have support for disabling page
tables in 32bit mode and does not have 16bit mode.

On older systems (I haven't looked lately) I have seen all kinds of
oddities in the descriptions of memory. Like not describing the memory
at address 0 where the real mode IDT lives. So I am not at all certain
any firmware information can be depended upon or reasonably expected to
be complete. For a while there was no concept of firmware memory areas
so on some older systems it was actually required for their to be gaps
in the description of memory provided to the system, so that operating
systems would not touch memory used by the firmware.

Which definitely means in the case of kexec there are legitimate reasons
to access memory areas that are well known but have not always been
descried by the boot firmware. So the assertion that it is necessarily
a firmware bug for not describing all of memory of memory is at least
historically incorrect on x86_64.

There may be different requirements for the kexec identity map and the
ordinary kernel boot type memory map and as we look at solutions that
can reasonably be explored

> Some memory ends up not being mapped, but it is not
> clear if it is due to some other bug, such as bios not accurately
> providing the right memory map or some other kernel code path did
> not map what it should.

> The 1G mapping covers up that type issue.

I have seen this assertion repeated several times, and at least
historically on x86_64 it is most definitely false. The E820 map which
was the primary information source for a long time could not describe
all of memory so depending upon it to be complete is erroneous.

>> When there's boot breakage with new patches, we back out the bad patch
>> and re-try in 99.9% of the cases.
>
> Steve can certainly merge his two patches and resubmit, to replace the
> reverted original patch. He should be on in the morning to speak for
> himself.

I am going to push back and suggest that this is perhaps a bug in the
HPE UV systems firmware not setting up the cpus memory type range
registers correctly.

Unless those systems are using new fangled cpus that don't have 16bit
and 32bit support, and don't implement memory type range registers,
I don't see how something that only affects HPE UV systems could be
anything except an HPE UV specific bug.

Eric