Re: [PATCH] x86/mm/ident_map: Use full gbpages in identity maps except on UV platform.

From: Russ Anderson
Date: Mon Mar 25 2024 - 06:28:21 EST


On Sun, Mar 24, 2024 at 11:31:39AM +0100, Ingo Molnar wrote:
>
> * Steve Wahl <steve.wahl@xxxxxxx> wrote:
>
> > Some systems have ACPI tables that don't include everything that needs
> > to be mapped for a successful kexec. These systems rely on identity
> > maps that include the full gigabyte surrounding any smaller region
> > requested for kexec success. Without this, they fail to kexec and end
> > up doing a full firmware reboot.
> >
> > So, reduce the use of GB pages only on systems where this is known to
> > be necessary (specifically, UV systems).
> >
> > Signed-off-by: Steve Wahl <steve.wahl@xxxxxxx>
> > Fixes: d794734c9bbf ("x86/mm/ident_map: Use gbpages only where full GB page should be mapped.")
> > Reported-by: Pavin Joseph <me@xxxxxxxxxxxxxxx>
>
> Sigh, why was d794734c9bbf marked for a -stable backport? The commit
> never explains ...

I will try to explain, since Steve is offline. That commit fixes a
legitimate bug where more address range is mapped (1G) than the
requested address range. The fix avoids the issue of cpu speculativly
loading beyond the requested range, which inludes specutalive loads
from reserved memory. That is why it was marked for -stable.

> If it's broken, it should be reverted - instead of trying to partially
> revert and then maybe break some other systems.

Three people reported that mapping only the correct address range
caused problems on their platforms. https://lore.kernel.org/all/3a1b9909-45ac-4f97-ad68-d16ef1ce99db@xxxxxxxxxxxxxxx/
Steve and several people helped debug the issue. The commit itself
looks correct but the correct behavior causes some side effect on
a few platforms. Some memory ends up not being mapped, but it is not
clear if it is due to some other bug, such as bios not accurately
providing the right memory map or some other kernel code path did
not map what it should. The 1G mapping covers up that type issue.

Steve's second patch was to not break those platforms while leaving the
fix on the platform detected the original mapping problem (UV platform).

> When there's boot breakage with new patches, we back out the bad patch
> and re-try in 99.9% of the cases.

Steve can certainly merge his two patches and resubmit, to replace the
reverted original patch. He should be on in the morning to speak for
himself.

Thanks
--
Russ Anderson, SuperDome Flex Linux Kernel Group Manager
HPE - Hewlett Packard Enterprise (formerly SGI) rja@xxxxxxx