Re: [PATCH] kexec: allocate kernel above bzImage's pref_address

From: H. Peter Anvin
Date: Fri Dec 15 2023 - 18:09:37 EST


On December 15, 2023 1:17:00 PM PST, Dave Hansen <dave.hansen@xxxxxxxxx> wrote:
>On 12/15/23 11:05, Chris Koch wrote:
>> A relocatable kernel will relocate itself to pref_address if it is
>> loaded below pref_address. This means a booted kernel may be relocating
>> itself to an area with reserved memory on modern systems, potentially
>> clobbering arbitrary data that may be important to the system.
>>
>> This is often the case, as the default value of PHYSICAL_START is
>> 0x1000000 and kernels are typically loaded at 0x100000 or above by
>> bootloaders like iPXE or kexec. GRUB behaves like this patch does.
>>
>> Also fixes the documentation around pref_address and PHYSICAL_START to
>> be accurate.
>
>Are you reporting a bug and is this a bug fix? It's not super clear
>from the changelog.
>
>
>> diff --git a/Documentation/arch/x86/boot.rst b/Documentation/arch/x86/boot.rst
>> index 22cc7a040dae..49bea8986620 100644
>> --- a/Documentation/arch/x86/boot.rst
>> +++ b/Documentation/arch/x86/boot.rst
>> @@ -878,7 +878,8 @@ Protocol: 2.10+
>> address if possible.
>>
>> A non-relocatable kernel will unconditionally move itself and to run
>> - at this address.
>> + at this address. A relocatable kernel will move itself to this address if it
>> + loaded below this address.
>
>I think we should avoid saying the same things over and over again in
>different spots.
>
>Here, it doesn't really help to enumerate the different interpretations
>of 'pref_address'. All that matters is that the bootloader can avoid
>the overhead of a later copy if it can place the kernel at
>'pref_address'. The exact reasons that various kernels might decide to
>relocate are unimportant here.
>
>> ============ =======
>> Field name: init_size
>> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
>> index 3762f41bb092..1370f43328d7 100644
>> --- a/arch/x86/Kconfig
>> +++ b/arch/x86/Kconfig
>> @@ -2109,11 +2109,11 @@ config PHYSICAL_START
>> help
>> This gives the physical address where the kernel is loaded.
>>
>> - If kernel is a not relocatable (CONFIG_RELOCATABLE=n) then
>> - bzImage will decompress itself to above physical address and
>> - run from there. Otherwise, bzImage will run from the address where
>> - it has been loaded by the boot loader and will ignore above physical
>> - address.
>> + If the kernel is not relocatable (CONFIG_RELOCATABLE=n) then bzImage
>> + will decompress itself to above physical address and run from there.
>> + Otherwise, bzImage will run from the address where it has been loaded
>> + by the boot loader. The only exception is if it is loaded below the
>> + above physical address, in which case it will relocate itself there.
>
>I kinda dislike how this is written. It's written almost like code
>where you're spelling out the conditions. I prefer something much
>higher-level.
>
> This gives a minimum physical address at which the kernel can be
> loaded.
>
> CONFIG_RELOCATABLE=n kernels will be decompressed to and must
> run at PHYSICAL_START exactly.
>
> CONFIG_RELOCATABLE=y kernels can run at any address above
> PHYSICAL_START. If a kernel is loaded below PHYSICAL_START, it
> will relocate itself to PHYSICAL_START.
>
>> In normal kdump cases one does not have to set/change this option
>> as now bzImage can be compiled as a completely relocatable image
>> diff --git a/arch/x86/kernel/kexec-bzimage64.c b/arch/x86/kernel/kexec-bzimage64.c
>> index a61c12c01270..5dcd232d58bf 100644
>> --- a/arch/x86/kernel/kexec-bzimage64.c
>> +++ b/arch/x86/kernel/kexec-bzimage64.c
>> @@ -498,7 +498,10 @@ static void *bzImage64_load(struct kimage *image, char *kernel,
>> kbuf.bufsz = kernel_len - kern16_size;
>> kbuf.memsz = PAGE_ALIGN(header->init_size);
>> kbuf.buf_align = header->kernel_alignment;
>> - kbuf.buf_min = MIN_KERNEL_LOAD_ADDR;
>> + if (header->pref_address < MIN_KERNEL_LOAD_ADDR)
>> + kbuf.buf_min = MIN_KERNEL_LOAD_ADDR;
>> + else
>> + kbuf.buf_min = header->pref_address;
>> kbuf.mem = KEXEC_BUF_MEM_UNKNOWN;
>> ret = kexec_add_buffer(&kbuf);
>> if (ret)
>
>Comment, please.
>
>It isn't clear from this hunk why or how this fixes the bug. How does
>this manage to avoid clobbering reserved areas?

It is a bug and a bug fix.