Re: [RFC] efi: Add ACPI_MEMORY_NVS into the linear map

From: Ard Biesheuvel
Date: Thu Feb 15 2024 - 18:21:36 EST


(cc Oliver)

On Thu, 15 Feb 2024 at 23:51, Boqun Feng <boqun.feng@xxxxxxxxx> wrote:
>
> Currently ACPI_MEMORY_NVS is omitted from the linear map, which causes
> a trouble with the following firmware memory region setup:
>
> [..] efi: 0x0000dfd62000-0x0000dfd83fff [ACPI Reclaim|...]
> [..] efi: 0x0000dfd84000-0x0000dfd87fff [ACPI Mem NVS|...]
>

Which memory types were listed here?

> , on ARM64 with 64k page size, the whole 0x0000dfd80000-0x0000dfd8ffff
> range will be omitted from the the linear map due to 64k round-up. And
> a page fault happens when trying to access the ACPI_RECLAIM_MEMORY:
>
> [...] Unable to handle kernel paging request at virtual address ffff0000dfd80000
>

You trimmed all the useful information here. ACPI reclaim memory is
reclaimable, but we don't actually do so in Linux. So this is not
general purpose memory, it is used for a specific purpose, and the
code that accesses it is assuming that it is accessible via the linear
map. There are reason why this may not be the case, so the fix might
be to use memremap() in the access instead.

> To fix this, add ACPI_MEMORY_NVS into the linear map.
>

There is a requirement in the arm64 bindings in the UEFI spec that
says that mixed attribute mappings within a 64k page are not allowed.

This is not a very clear description of the requirement or the issue
it is intended to work around. In short, the following memory types
are special

– EfiRuntimeServicesCode – EfiRuntimeServicesData – EfiReserved –
EfiACPIMemoryNVS

and care must be taken to ensure that allocations of these types are
never mapped with mismatched attributes, which might happen on a 64k
page size OS if a mapping is rounded outwards and ends up covering the
adjacent region.

The Tianocore reference implementation of UEFI achieves this by simply
aligning all allocations of these types to 64k, so that the OS never
has to reason about whether or not region A and region B sharing a 64k
page frame could have mappings or aliases that are incompatible.
(I.e., all mappings of A are compatible with all mappings of B)

ACPI reclaim is just memory, EfiACPIMemoryNVS could have special
semantics that the OS knows nothing about. That makes it unsafe to
assume that we can simply create a cacheable and writable mapping for
this memory.

> Signed-off-by: Boqun Feng <boqun.feng@xxxxxxxxx>
> Cc: stable@xxxxxxxxxxxxxxx # 5.15+
> ---
> We hit this in an ARM64 Hyper-V VM when using 64k page size, although
> this issue may also be fixed if the efi memory regions are all 64k
> aligned, but I don't find this memory region setup is invalid per UEFI
> spec, also I don't find that spec disallows ACPI_MEMORY_NVS to be mapped
> in the OS linear map, but if there is any better way or I'm reading the
> spec incorrectly, please let me know.
>

I'd prefer fixing this in the firmware.

> It's Cced stable since 5.15 because that's when Hyper-V ARM64 support is
> added, and Hyper-V is the only one that hits the problem so far.
>
> drivers/firmware/efi/efi-init.c | 9 +++++++--
> 1 file changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/firmware/efi/efi-init.c b/drivers/firmware/efi/efi-init.c
> index a00e07b853f2..9a1b9bc66d50 100644
> --- a/drivers/firmware/efi/efi-init.c
> +++ b/drivers/firmware/efi/efi-init.c
> @@ -139,6 +139,7 @@ static __init int is_usable_memory(efi_memory_desc_t *md)
> case EFI_LOADER_CODE:
> case EFI_LOADER_DATA:
> case EFI_ACPI_RECLAIM_MEMORY:
> + case EFI_ACPI_MEMORY_NVS:
> case EFI_BOOT_SERVICES_CODE:
> case EFI_BOOT_SERVICES_DATA:
> case EFI_CONVENTIONAL_MEMORY:
> @@ -202,8 +203,12 @@ static __init void reserve_regions(void)
> if (!is_usable_memory(md))
> memblock_mark_nomap(paddr, size);
>
> - /* keep ACPI reclaim memory intact for kexec etc. */
> - if (md->type == EFI_ACPI_RECLAIM_MEMORY)
> + /*
> + * keep ACPI reclaim and NVS memory and intact for kexec
> + * etc.
> + */
> + if (md->type == EFI_ACPI_RECLAIM_MEMORY ||
> + md->type == EFI_ACPI_MEMORY_NVS)
> memblock_reserve(paddr, size);
> }
> }
> --
> 2.43.0
>
>