Re: [PATCH v22 6/8] crash: hotplug support for kexec_load()

From: Baoquan He
Date: Mon May 08 2023 - 01:14:40 EST


On 05/03/23 at 06:41pm, Eric DeVolder wrote:
> The hotplug support for kexec_load() requires coordination with
> userspace, and therefore a little extra help from the kernel to
> facilitate the coordination.
>
> In the absence of the solution contained within this particular
> patch, if a kdump capture kernel is loaded via kexec_load() syscall,
> then the crash hotplug logic would find the segment containing the
> elfcorehdr, and upon a hotplug event, rewrite the elfcorehdr. While
> generally speaking that is the desired behavior and outcome, a
> problem arises from the fact that if the kdump image includes a
> purgatory that performs a digest checksum, then that check would
> fail (because the elfcorehdr was changed), and the capture kernel
> would fail to boot and no kdump occur.
>
> Therefore, what is needed is for the userspace kexec-tools to
> indicate to the kernel whether or not the supplied kdump image/
> elfcorehdr can be modified (because the kexec-tools excludes the
> elfcorehdr from the digest, and sizes the elfcorehdr memory buffer
> appropriately).
>
> To solve these problems, this patch introduces:
> - a new kexec flag KEXEC_UPATE_ELFCOREHDR to indicate that it is
> safe for the kernel to modify the elfcorehdr (because kexec-tools
> has excluded the elfcorehdr from the digest).
> - the /sys/kernel/crash_elfcorehdr_size node to communicate to
> kexec-tools what the preferred size of the elfcorehdr memory buffer
> should be in order to accommodate hotplug changes.
> - The sysfs crash_hotplug nodes (ie.
> /sys/devices/system/[cpu|memory]/crash_hotplug) are now dynamic in
> that they examine kexec_file_load() vs kexec_load(), and when
> kexec_load(), whether or not KEXEC_UPDATE_ELFCOREHDR is in effect.
> This is critical so that the udev rule processing of crash_hotplug
> indicates correctly (ie. the userspace unload-then-load of the
> kdump of the kdump image can be skipped, or not).
>
> With this patch in place, I believe the following statements to be true
> (with local testing to verify):
>
> - For systems which have these kernel changes in place, but not the
> corresponding changes to the crash hot plug udev rules and
> kexec-tools, (ie "older" systems) those systems will continue to
> unload-then-load the kdump image, as has always been done. The
> kexec-tools will not set KEXEC_UPDATE_ELFCOREHDR.
> - For systems which have these kernel changes in place and the proposed
> udev rule changes in place, but not the kexec-tools changes in place:
> - the use of kexec_load() will not set KEXEC_UPDATE_ELFCOREHDR and
> so the unload-then-reload of kdump image will occur (the sysfs
> crash_hotplug nodes will show 0).
> - the use of kexec_file_load() will permit sysfs crash_hotplug nodes
> to show 1, and the kernel will modify the elfcorehdr directly. And
> with the udev changes in place, the unload-then-load will not occur!
> - For systems which have these kernel changes as well as the udev and
> kexec-tools changes in place, then the user/admin has full authority
> over the enablement and support of crash hotplug support, whether via
> kexec_file_load() or kexec_load().
>
> Said differently, as kexec_load() was/is widely in use, these changes
> permit it to continue to be used as-is (retaining the current unload-then-
> reload behavior) until such time as the udev and kexec-tools changes can
> be rolled out as well.
>
> I've intentionally kept the changes related to userspace coordination
> for kexec_load() separate as this need was identified late; the
> rest of this series has been generally reviewed and accepted. Once
> this support has been vetted, I can refactor if needed.
>
> Suggested-by: Hari Bathini <hbathini@xxxxxxxxxxxxx>
> Signed-off-by: Eric DeVolder <eric.devolder@xxxxxxxxxx>

LGTM,

Acked-by: Baoquan He <bhe@xxxxxxxxxx>