Re: [PATCH v2 0/5] arm64: Make kexec_file_load honor iomem reservations

From: James Morse
Date: Fri Jun 04 2021 - 12:20:48 EST


Hi Marc,

On 31/05/2021 10:57, Marc Zyngier wrote:
> This series is a complete departure from the approach I initially sent
> almost a month ago[0]. Instead of trying to teach EFI, ACPI and other
> subsystem to use memblock, I've decided to stick with the iomem
> resource tree and use that exclusively for arm64.

> This means that my current approach is (despite what I initially
> replied to both Dave and Catalin) to provide an arm64-specific
> implementation of arch_kexec_locate_mem_hole() which walks the
> resource tree and excludes ranges of RAM that have been registered for
> any odd purpose. This is exactly what the userspace implementation
> does, and I don't really see a good reason to diverge from it.

Because in the ideal world we'd have only 'is it reserved' list to check against.
Memblock has been extended before. The resource-list is overly stringy, and I'm not sure
we can shove everything in the resource list.

Kexec already has problems on arm64 with memory hotplug. Fixing this for regular kexec in
/proc/iomem was rejected, and memblock's memblock_is_hotpluggable() is broken because
free_low_memory_core_early() does this:
| memblock_clear_hotplug(0, -1)

Once that has been unpicked its clear kexec_file_load() can use
memblock_is_hotpluggable(). (its on the todo list, well, jira)


I'd prefer to keep kexec using memblock because it _shouldn't_ change after boot. Having
an "I want to reserve this and make it persistent over kexec" call that can happen at any
time can't work if the kexec image has already been loaded.
Practically, once user-space has started, you can't have new things you want to reserve
over kexec.


I don't see how the ACPI tables can escape short of a firmware bug. Could someone with an
affected platform boot with efi=debug and post the EFI memory map and the 'ACPI: FOO
0xphysicaladdress' stuff at the top of the boot log?


efi_mem_reserve_persistent() has one caller for the GIC ITS stuff.

For the ITS, the reservations look like they are behind irqchip_init(), which is well
before the arch_initcall() that updates the resource tree from memblock. Your v1's first
patch should be sufficient.


> Again, this allows my Synquacer board to reliably use kexec_file_load
> with as little as 256M, something that would always fail before as it
> would overwrite most of the reserved tables.
>
> Although this series still targets 5.14, the initial patch is a
> -stable candidate, and disables non-kdump uses of kexec_file_load. I
> have limited it to 5.10, as earlier kernels will require a different,
> probably more invasive approach.
>
> Catalin, Ard: although this series has changed a bit compared to v1,
> I've kept your AB/RB tags. Should anything seem odd, please let me
> know and I'll drop them.


Thanks,

James


[0] I'm pretty sure this is enough. (Not tested)
diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c
index 4b7ee3fa9224..3ed45153ce7f 100644
--- a/drivers/firmware/efi/efi.c
+++ b/drivers/firmware/efi/efi.c
@@ -893,7 +893,7 @@ static int __init efi_memreserve_map_root(void)
return 0;
}

-static int efi_mem_reserve_iomem(phys_addr_t addr, u64 size)
+static int __efi_mem_reserve_iomem(phys_addr_t addr, u64 size)
{
struct resource *res, *parent;

@@ -911,6 +911,16 @@ static int efi_mem_reserve_iomem(phys_addr_t addr, u64 size)
return parent ? request_resource(parent, res) : 0;
}

+static int efi_mem_reserve_iomem(phys_addr_t addr, u64 size)
+{
+ int err = __efi_mem_reserve_iomem(addr, size);
+
+ if(IS_ENABLED(CONFIG_ARCH_KEEP_MEMBLOCK) && !err)
+ memblock_reserve(addr, size);
+
+ return err;
+}
+
int __ref efi_mem_reserve_persistent(phys_addr_t addr, u64 size)
{
struct linux_efi_memreserve *rsv;