Re: [PATCH] kexec: allocate kernel above bzImage's pref_address

From: Chris Koch
Date: Fri Dec 15 2023 - 16:39:06 EST


On Fri, Dec 15, 2023 at 1:17 PM Dave Hansen <dave.hansen@xxxxxxxxx> wrote:
>
> On 12/15/23 11:05, Chris Koch wrote:
> > A relocatable kernel will relocate itself to pref_address if it is
> > loaded below pref_address. This means a booted kernel may be relocating
> > itself to an area with reserved memory on modern systems, potentially
> > clobbering arbitrary data that may be important to the system.
> >
> > This is often the case, as the default value of PHYSICAL_START is
> > 0x1000000 and kernels are typically loaded at 0x100000 or above by
> > bootloaders like iPXE or kexec. GRUB behaves like this patch does.
> >
> > Also fixes the documentation around pref_address and PHYSICAL_START to
> > be accurate.
>
> Are you reporting a bug and is this a bug fix? It's not super clear
> from the changelog.

I reported it as a bug yesterday in
https://lkml.org/lkml/2023/12/14/1529 -- I'm happy to reword this in
some way that indicates it's a bug fix.

>
>
> > diff --git a/Documentation/arch/x86/boot.rst b/Documentation/arch/x86/boot.rst
> > index 22cc7a040dae..49bea8986620 100644
> > --- a/Documentation/arch/x86/boot.rst
> > +++ b/Documentation/arch/x86/boot.rst
> > @@ -878,7 +878,8 @@ Protocol: 2.10+
> > address if possible.
> >
> > A non-relocatable kernel will unconditionally move itself and to run
> > - at this address.
> > + at this address. A relocatable kernel will move itself to this address if it
> > + loaded below this address.
>
> I think we should avoid saying the same things over and over again in
> different spots.
>
> Here, it doesn't really help to enumerate the different interpretations
> of 'pref_address'. All that matters is that the bootloader can avoid
> the overhead of a later copy if it can place the kernel at
> 'pref_address'. The exact reasons that various kernels might decide to
> relocate are unimportant here.

I think it's important documentation for bootloader authors. It's not
about avoiding overhead, it's about avoiding clobbering areas of
memory that may be reserved in e820 / EFI memory map, which the kernel
will do when it relocates itself to pref_address without checking
what's reserved and what's not. It emphasizes the importance of
choosing an address above pref_address. Happy to reword some way to
reflect that.

>
> > ============ =======
> > Field name: init_size
> > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> > index 3762f41bb092..1370f43328d7 100644
> > --- a/arch/x86/Kconfig
> > +++ b/arch/x86/Kconfig
> > @@ -2109,11 +2109,11 @@ config PHYSICAL_START
> > help
> > This gives the physical address where the kernel is loaded.
> >
> > - If kernel is a not relocatable (CONFIG_RELOCATABLE=n) then
> > - bzImage will decompress itself to above physical address and
> > - run from there. Otherwise, bzImage will run from the address where
> > - it has been loaded by the boot loader and will ignore above physical
> > - address.
> > + If the kernel is not relocatable (CONFIG_RELOCATABLE=n) then bzImage
> > + will decompress itself to above physical address and run from there.
> > + Otherwise, bzImage will run from the address where it has been loaded
> > + by the boot loader. The only exception is if it is loaded below the
> > + above physical address, in which case it will relocate itself there.
>
> I kinda dislike how this is written. It's written almost like code
> where you're spelling out the conditions. I prefer something much
> higher-level.
>
> This gives a minimum physical address at which the kernel can be
> loaded.
>
> CONFIG_RELOCATABLE=n kernels will be decompressed to and must
> run at PHYSICAL_START exactly.
>
> CONFIG_RELOCATABLE=y kernels can run at any address above
> PHYSICAL_START. If a kernel is loaded below PHYSICAL_START, it
> will relocate itself to PHYSICAL_START.

Happy to change that, yours is better.

>
> > In normal kdump cases one does not have to set/change this option
> > as now bzImage can be compiled as a completely relocatable image
> > diff --git a/arch/x86/kernel/kexec-bzimage64.c b/arch/x86/kernel/kexec-bzimage64.c
> > index a61c12c01270..5dcd232d58bf 100644
> > --- a/arch/x86/kernel/kexec-bzimage64.c
> > +++ b/arch/x86/kernel/kexec-bzimage64.c
> > @@ -498,7 +498,10 @@ static void *bzImage64_load(struct kimage *image, char *kernel,
> > kbuf.bufsz = kernel_len - kern16_size;
> > kbuf.memsz = PAGE_ALIGN(header->init_size);
> > kbuf.buf_align = header->kernel_alignment;
> > - kbuf.buf_min = MIN_KERNEL_LOAD_ADDR;
> > + if (header->pref_address < MIN_KERNEL_LOAD_ADDR)
> > + kbuf.buf_min = MIN_KERNEL_LOAD_ADDR;
> > + else
> > + kbuf.buf_min = header->pref_address;
> > kbuf.mem = KEXEC_BUF_MEM_UNKNOWN;
> > ret = kexec_add_buffer(&kbuf);
> > if (ret)
>
> Comment, please.
>
> It isn't clear from this hunk why or how this fixes the bug. How does
> this manage to avoid clobbering reserved areas?

When allocated above pref_address, the kernel will not relocate itself
to an area that potentially overlaps with reserved memory. I'll add a
comment.

Not sure what the etiquette is on immediately sending a patch v2, or
waiting for more comments. I'll err on waiting on a couple more
comments before sending v2. Thanks for the review

Chris