Re: [PATCH v4] x86/mm/ident_map: On UV systems, use gbpages only where full GB page should be mapped.

From: Steve Wahl
Date: Thu Mar 28 2024 - 12:18:07 EST


Note: I cc:'d stable in the email headers by mistake. NO CC: stable
tag, I don't want this to go into stable.

Thanks,

--> Steve

On Thu, Mar 28, 2024 at 11:06:14AM -0500, Steve Wahl wrote:
> When ident_pud_init() uses only gbpages to create identity maps, large
> ranges of addresses not actually requested can be included in the
> resulting table; a 4K request will map a full GB. On UV systems, this
> ends up including regions that will cause hardware to halt the system
> if accessed (these are marked "reserved" by BIOS). Even processor
> speculation into these regions is enough to trigger the system halt.
> And MTRRs cannot be used to restrict this speculation, there are not
> enough MTRRs to cover all the reserved regions.
>
> The fix for that would be to only use gbpages when map creation
> requests include the full GB page of space, and falling back to using
> smaller 2M pages when only portions of a GB page are included in the
> request.
>
> But on some other systems, possibly due to buggy bios, that solution
> leaves some areas out of the identity map that are needed for kexec to
> succeed. It is believed that these areas are not marked properly for
> map_acpi_tables() in arch/x86/kernel/machine_kexec_64.c to catch and
> map them. The nogbpages kernel command line option also causes these
> systems to fail even without these changes.
>
> So, create kexec identity maps using full GB pages on all platforms
> but UV; on UV, use narrower 2MB pages in the identity map where a full
> GB page would include areas outside the region requested.
>
> No attempt is made to coalesce mapping requests. If a request requires
> a map entry at the 2M (pmd) level, subsequent mapping requests within
> the same 1G region will also be at the pmd level, even if adjacent or
> overlapping such requests could have been combined to map a full
> gbpage. Existing usage starts with larger regions and then adds
> smaller regions, so this should not have any great consequence.
>
> Signed-off-by: Steve Wahl <steve.wahl@xxxxxxx>
>
> Fixes: d794734c9bbf ("x86/mm/ident_map: Use gbpages only where full GB page should be mapped.")
> Reported-by: Pavin Joseph <me@xxxxxxxxxxxxxxx>
> Closes: https://lore.kernel.org/all/3a1b9909-45ac-4f97-ad68-d16ef1ce99db@xxxxxxxxxxxxxxx/
> Link: https://lore.kernel.org/all/20240322162135.3984233-1-steve.wahl@xxxxxxx/
> Tested-by: Pavin Joseph <me@xxxxxxxxxxxxxxx>
> Tested-by: Eric Hagberg <ehagberg@xxxxxxxxx>
> Tested-by: Sarah Brofeldt <srhb@xxxxxx>
> ---
>
> v4: Incorporate fix for regression on systems relying on gbpages
> mapping more than the ranges actually requested for successful
> kexec, by limiting the effects of the change to UV systems.
> This patch based on tip/x86/urgent.
>
> v3: per Dave Hansen review, re-arrange changelog info,
> refactor code to use bool variable and split out conditions.
>
> v2: per Dave Hansen review: Additional changelog info,
> moved pud_large() check earlier in the code, and
> improved the comment describing the conditions
> that restrict gbpage usage.
>
>
> arch/x86/include/asm/init.h | 1 +
> arch/x86/kernel/machine_kexec_64.c | 10 ++++++++++
> arch/x86/mm/ident_map.c | 24 +++++++++++++++++++-----
> 3 files changed, 30 insertions(+), 5 deletions(-)
>
> diff --git a/arch/x86/include/asm/init.h b/arch/x86/include/asm/init.h
> index cc9ccf61b6bd..371d9faea8bc 100644
> --- a/arch/x86/include/asm/init.h
> +++ b/arch/x86/include/asm/init.h
> @@ -10,6 +10,7 @@ struct x86_mapping_info {
> unsigned long page_flag; /* page flag for PMD or PUD entry */
> unsigned long offset; /* ident mapping offset */
> bool direct_gbpages; /* PUD level 1GB page support */
> + bool direct_gbpages_only; /* use 1GB pages exclusively */
> unsigned long kernpg_flag; /* kernel pagetable flag override */
> };
>
> diff --git a/arch/x86/kernel/machine_kexec_64.c b/arch/x86/kernel/machine_kexec_64.c
> index b180d8e497c3..3a2f5d291a88 100644
> --- a/arch/x86/kernel/machine_kexec_64.c
> +++ b/arch/x86/kernel/machine_kexec_64.c
> @@ -28,6 +28,7 @@
> #include <asm/setup.h>
> #include <asm/set_memory.h>
> #include <asm/cpu.h>
> +#include <asm/uv/uv.h>
>
> #ifdef CONFIG_ACPI
> /*
> @@ -212,6 +213,15 @@ static int init_pgtable(struct kimage *image, unsigned long start_pgtable)
>
> if (direct_gbpages)
> info.direct_gbpages = true;
> + /*
> + * UV systems need restrained use of gbpages in the identity
> + * maps to avoid system halts. But some other systems rely on
> + * using gbpages to expand mappings outside the regions
> + * actually listed, to include areas required for kexec but
> + * not explicitly named by the bios.
> + */
> + if (!is_uv_system())
> + info.direct_gbpages_only = true;
>
> for (i = 0; i < nr_pfn_mapped; i++) {
> mstart = pfn_mapped[i].start << PAGE_SHIFT;
> diff --git a/arch/x86/mm/ident_map.c b/arch/x86/mm/ident_map.c
> index 968d7005f4a7..a538a54aba5d 100644
> --- a/arch/x86/mm/ident_map.c
> +++ b/arch/x86/mm/ident_map.c
> @@ -26,18 +26,32 @@ static int ident_pud_init(struct x86_mapping_info *info, pud_t *pud_page,
> for (; addr < end; addr = next) {
> pud_t *pud = pud_page + pud_index(addr);
> pmd_t *pmd;
> + bool use_gbpage;
>
> next = (addr & PUD_MASK) + PUD_SIZE;
> if (next > end)
> next = end;
>
> - if (info->direct_gbpages) {
> - pud_t pudval;
> + /* if this is already a gbpage, this portion is already mapped */
> + if (pud_leaf(*pud))
> + continue;
> +
> + /* Is using a gbpage allowed? */
> + use_gbpage = info->direct_gbpages;
>
> - if (pud_present(*pud))
> - continue;
> + if (!info->direct_gbpages_only) {
> + /* Don't use gbpage if it maps more than the requested region. */
> + /* at the beginning: */
> + use_gbpage &= ((addr & ~PUD_MASK) == 0);
> + /* ... or at the end: */
> + use_gbpage &= ((next & ~PUD_MASK) == 0);
> + }
> + /* Never overwrite existing mappings */
> + use_gbpage &= !pud_present(*pud);
> +
> + if (use_gbpage) {
> + pud_t pudval;
>
> - addr &= PUD_MASK;
> pudval = __pud((addr - info->offset) | info->page_flag);
> set_pud(pud, pudval);
> continue;
>
> base-commit: b6540de9b5c867b4c8bc31225db181cc017d8cc7
> --
> 2.26.2
>

--
Steve Wahl, Hewlett Packard Enterprise