Re: [RFC PATCH 1/2] Revert "x86/kexec/64: Prevent kexec from 5-level paging to a 4-level only kernel"

From: Baoquan He
Date: Mon Mar 04 2024 - 22:43:20 EST


On 03/04/24 at 12:11pm, Borislav Petkov wrote:
> On Mon, Mar 04, 2024 at 06:51:26PM +0800, Baoquan He wrote:
> > It's not true. Customer may want to try to load a different kernel if
>
> "may want" is one of those hypothetical things which we don't do. If we
> have to support everything a customer *may* want, then the kernel will
> be a madness.
>
> Also, you do realize that the kernel doesn't care about "customers",
> right?

Guess you mean upstream kernel doesn't care about 'customers'. Downstream
kernel does care about customers.

>
> And the question is, how *sensible* is such a use case?
>
> In my experience, not at all. You simply take the same kernel or a very
> similar one and kexec it.

Hmm, there's different view between upstream and downstream. For distros
kernel, we need a lot of testing to make sure one kernel is trustworthy
as kdump kernel. Here, 'a lot of testing' means a long list of user cases
for kexec/kdump. Please see below file from centos kexec-tools package:

https://git.centos.org/rpms/kexec-tools/blob/bb7919506eba39a2b7277c8d36fe1774f9c33428/f/SOURCES/supported-kdump-targets.txt

And the kdump kernel doesn't have to be the same kernel as the 1st kernel.
I can give several examples:

1) Nvidia GPU or AMD GPU doesn't work well when kexec/kdump jumping to
2nd kernel in some releases. When we meet that case, we want to use the
newer kernel as 1st kernel. we also want to deploy kdump kernel to
capture the vmcore for analyzing once corruption encountered. Then the
old kernel which have been tested and prove to be working well can be
configured as 2nd kernel.

2) in redhat's internal testing, we also run debugging kernel to
test, while the debugging kernel require much more memory to boot up and
run than normal kernel, e.g KASAN memory feature will eat up 1/8 of
system memory. In this case, we run debugging kernel, but use normal
kernel (non-debugging kernel) instead configured as kdump kernel.

And the original purpose of kexec feature Eric developed is to facilitate
kernel developer to jump into a new and different kernel. We never
enforce users have to set kernel for kexec/kdump as the current running
kernel. But we do need explain why if one kernel can't be set as a kdump
kernel when it's different than the current running kernel. E.g kdump
kernel is too old, or like this 5-level case, jumping from 5-level to
4-level will fail.

>
> > they have taken many testings and trust that kdump kernel, or for
> > debugging.
>
> Yes, and those kernels will have 5level too. Practically, distros must
> enable 5level support in their kernels in order to support modern hw.
>
> > The similar for kexec reboot into 2nd kernel. We don't enforce
> > kexec/kdump to work on the same kernel as the 1st kernel. With the
> > fail and message, user can take measure to avoid that. it's better the
> > failure is encountered when failing to jump to kexec/kdump kernel.
>
> I can't parse that example.
>
> Btw, kexec tools don't use those XLF_5LEVEL* flags bits either. Which
> basically means we don't really need them.

No, it's not true. Kexec-tools doesn't check, means kexec_load interface
doesn't checking the flag. But it's set in xloadflags, and checked in
kexec_file load. As we know, kexec_file load implements most of codes in
kernel. At that time, people were talking if continuing adding new feature
into kexec_load interface.

In this patch 1, you are removing the flag checking for kexec_file load
interface that RHEL/Fedora default to use.

> > I remmeber we have use case where customer used kdump kernel different
> > than the 1st kernel. While I don't remember why.
>
> See above.
>
> And that customer can still use the old distro kernels which have those
> flags.

These two patches includes two parts of work. One is marking the kernel
itself supporting 5-level or not. The other is if I am the running
kernel and capable of 5-level, need check if the being loaded kernel is
capable of 5-level. The 2nd part will be executed when kexec_file load
interface is invoked with 'kexec -s'.

If we take off the checking, and people want to jump from the new kernel
to an old kernel where 5-level kernel code haven't been added or
CONFIG_X86_5LEVEL is unset on purpose, it won't fail and prompt message at
all until 2nd kernel booting silently failed. E.g, the coming RHEL10 anchor
a upstream kernel w/o the flag checking, people want to kexec/kdump jump
from rhel10 to an old rhel7 kernel. It could be an extreme case, while
revealing the scenario.

> The point here is, going forward, 5level becomes ubiquitous and will be
> even more tightly integrated in the kernel so that it'll become just
> another default feature which is either there or not.
>
> So the distinction is going away and the flags can go too.

I understand this and it makes sense to me, the existing code need be
combined with the realistic usage. I will check with our QE and support
engineers to see how far the targeted kernel taken as kexec-ed/kdump
kdump is allowed to be from the 1st kernel in our support or possible
use case. If no use case is concerned, we can take off the flags and
checking. Will report back soon once I get feedback.

Thanks
Baoquan