Re: [External] Re: [PATCH] x86/purgatory: provide config to disable purgatory

From: Usama Arif
Date: Mon Dec 06 2021 - 11:51:15 EST


Hi,

I have sent a v3 of the patch with a much clearer commit message, please
let me know if there are any comments for the v3 patch or my responses below.

Thanks,
Usama

On 01/12/2021 10:29, Usama Arif wrote:
Hi,

(Resending the reply as my email client had updated and inserted html code and caused a bounceback from the mailing list, sorry about that.)

Thanks for your reply, I have responded with further comments/questions inline below, and also have provided some context for the patch at the start:

The patch is not introducing a new CONFIG option, as can be seen in the v2 patch diff. It is converting an existing CONFIG option that only enabled purgatory for specific architectures in which it has been implemented (x86, powerpc and s390) to an option that can that allow purgatory to be disabled (with default enabled) and only provides code to disable purgatory for x86 only. From what i see, purgatory is currently not yet implemented in other architectures like arm64 kexec_file case and riscv, so this would on a high level,
provide only the option to make kexec on x86 similar to other the other architectures.

As a background to the discussion, the usecase we are aiming is to update the host kernel with kexec in servers which is managing multiple VMs, to cut down the downtime of servers as much as possible so that its not noticeable to VMs. The patch is aimed at x86 purgatory code specifically. We are targetting other optimizations as well in other areas in boot path to cut down the 600ms time much further, so that cut down of 200ms downtime is significant in the usecase described. I did have a few comments/questions about your reply, please see my responses below:


On 29/11/2021 16:53, Eric W. Biederman wrote:
Usama Arif <usama.arif@xxxxxxxxxxxxx> writes:

Hi,

Thanks for your replies. I have submitted a v2 of the patch with a
much more detailed commit message including reason for the patch and timing values.

The time taken from reboot to running init process was measured
with both purgatory enabled and disabled over 20 runs and the
averages are:
Purgatory disabled:
- TSC = 3908766161 cycles
- ktime = 606.8 ms
Purgatory enabled:
- TSC = 5005811885 cycles (28.1% worse)
- ktime = 843.1 ms (38.9% worse)


Our reason for this patch is that it helps reduce the downtime of servers when
the host kernel managing multiple VMs needs to be updated via kexec,
but it makes reboot with kexec much faster so should be a general improvement in
boot time if purgatory is not needed and could have other usecases as well.
I believe only x86, powerpc and s390 have purgatory supported, other platforms
like arm64 dont have it implemented yet, so with the reboot time improvement seen,
it would be a good idea to have the option to disable purgatory completely but set default to y.
We also have the CONFIG_KEXEC_BZIMAGE_VERIFY_SIG which can be enabled to verify the next
kernel image to be booted and purgatory can be completely skipped if
not required.

CONFIG_KEXEC_BZIMAGE_VERIFY_SIG is something totally and completely
different.  It's job is to verify that the kernel to be booted comes
from a trusted source.   The sha256 verification in purgatory's job
is to verify that memory the kernel cares about was not corrupted
during the kexec process.


Thanks, acknowledged.

I believe when you say purgatory you are really talking about that
sha256 checksum.  It really is not possible to disable all of
the code that runs between kernels, as the old and the new kernel may
run at the same addresses.  Anything that runs between the two kernels
is what is referred to as purgatory.  Even if it is just a small
assembly stub.

>

With this patch, i am trying to give an option (with default purgatory enabled)to disable purgatory completely on x86 only, i.e. no code running between 2 kernels on x86. From my understanding there is no purgatory in arm64 kexec_file case, in riscv and in some other archs as well, so I am not sure why its not possible to disable purgatory (all code running between 2 kernels) in x86? Unless i misunderstood something about the working of other platforms? In x86 case, from what i see relocate_kernel is still part of the older kernel and not purgatory, and with my patch, purgatory.ro is not built and kexec does execute successfully with purgatory disabled.

About your point for the old and the new kernel may run at the same addresses,i dont think that can happen as in bzImage64_load function its loaded using the kbuf struct. This doesnt happen in other architectures (for e.g. arm64) that dont implement purgatory and any of the tests I conducted with the patch applied due to the use of kbuf struct.

From what i see in the code, purgatory in x86 specifically, has 2 main functions, sha256 verification and loading the %rsi register for bootparam_load_addr. In this patch purgatory was disabled, which resulted in sha256 verification disabled and bootparam_load_addr moved to relocate_kernel. Our analysis revealed that most of the time is spent in the sha256 verification, so I would be ok to reformat the patch
to make the sha verification optional and keep purgatory enabled if thats preferred? Although in my opinion the current patch of only providing an option to disable purgatory seems much better.


That sha256 verification is always needed for kexec on panic, there are
by the nature of a kernel panic too many unknowns to have any confidence
the new kernel will not be corrupted in the process of kexec before it
gets started.

For an ordinary kexec it might be possible to say that you have a
reliable kernel shutdown process and you know for a fact that something
won't come along and corrupt the kernel.  I find that a questionable
assertion.  I haven't seen anyone yet whose focus when getting an
ordinary kexec to work as anything other than making certain all of the
drivers are shutdown properly.

I have seen countless times when a network packet comes in a the wrong
time and the target kernel's memory is corrupted before it gets far
enough to initialize the network driver. >

I agree that when doing testing and research, there are things that can
go wrong during kexec process, but i assume that the expectation is that
in production environment, when for e.g. updating a kernel with kexec in servers,the expectation is that kexec will execute successfully as long as there is nothing wrong in the old kernel and the new kernel.

Maybe I didn't understand this correctly, but if a network packet comes in and corrupts the kernel memory, wont it also cause a problem even when purgatory is enabled? I agree that issue like these would be caught earlier with purgatory, but then if something like this breaks the kexec process in a production environment where the old and new kernel are well tested, trusted and expected to work, wouldn't there be a much more fundamental issue with the reliability of the current kexec process.

For a 0.2s speed up you are talking about disabling all of the safety
checks in a very dangerous situation.  How much can you can in
performance by optimizing the sha256 implementation instead of using
what is essentially a reference implementation in basic C that I copied
from somewhere long ago.

Optimize the sha256 implementation and the memory copy loop and then
show how the tiny bit of time that is left is on a mission critical path
and must be removed.  Then we can reasonably talk about a config option
for disabling the sha256 implementation in the kexec in not-panic case.

Thanks for this, I assume that over here you are suggesting for e.g. in x86 to replace the existing sha verification implementation in purgatory with the one in arch/x86/crypto/sha256_ssse3_glue.c? Also could you point to the memory copy loop to optimize? I can have a look at these 2 options.

Even with these optimizations, i think the v2 patch should still be considered. The patch in the end is providing only an option to disable purgatory on x86 only, with the default value of keeping it enabled and not changing kexec behaviour. The CONFIG already existed, it is just renamed and now provides user the option to select to disable purgatory for x86, rather than it being architecture dependent.

Thanks again for the review and comments,
Usama


That sha256 implementation in part so that we can all sleep at night
because we don't have to deal with very very strange heizenbugs.

Eric