Re: [PATCH v14 0/7] crash: Kernel handling of CPU and memory hot un/plug

From: Sourabh Jain
Date: Wed Nov 23 2022 - 10:33:55 EST



On 17/11/22 03:16, Eric DeVolder wrote:
When the kdump service is loaded, if a CPU or memory is hot
un/plugged, the crash elfcorehdr, which describes the CPUs
and memory in the system, must also be updated, else the resulting
vmcore is inaccurate (eg. missing either CPU context or memory
regions).

The current solution utilizes udev to initiate an unload-then-reload
of the kdump image (eg. kernel, initrd, boot_params, puratory and
elfcorehdr) by the userspace kexec utility. In previous posts I have
outlined the significant performance problems related to offloading
this activity to userspace.

This patchset introduces a generic crash hot un/plug handler that
registers with the CPU and memory notifiers. Upon CPU or memory
changes, this generic handler is invoked and performs important
housekeeping, for example obtaining the appropriate lock, and then
invokes an architecture specific handler to do the appropriate
updates.

In the case of x86_64, the arch specific handler generates a new
elfcorehdr, and overwrites the old one in memory; thus no
involvement with userspace needed.

To realize the benefits/test this patchset, one must make a couple
of minor changes to userspace:

- Prevent udev from updating kdump crash kernel on hot un/plug changes.
Add the following as the first lines to the RHEL udev rule file
/usr/lib/udev/rules.d/98-kexec.rules:

# The kernel handles updates to crash elfcorehdr for cpu and memory changes
SUBSYSTEM=="cpu", ATTRS{crash_hotplug}=="1", GOTO="kdump_reload_end"
SUBSYSTEM=="memory", ATTRS{crash_hotplug}=="1", GOTO="kdump_reload_end"

These lines will cause cpu and memory hot un/plug events to be
skipped within this rule file, with this changset applied.

- Change to the kexec_file_load for loading the kdump kernel:
Eg. on RHEL: in /usr/bin/kdumpctl, change to:
standard_kexec_args="-p -d -s"
which adds the -s to select kexec_file_load syscall.

This kernel patchset also supports kexec_load() with a modified kexec
userspace utility. A working changeset to the kexec userspace utility
is posted to the kexec-tools mailing list here:

http://lists.infradead.org/pipermail/kexec/2022-October/026032.html

To use the kexec-tools patch, apply, build and install kexec-tools,
then change the kdumpctl's standard_kexec_args to replace the -s with
--hotplug. The removal of -s reverts to the kexec_load syscall and
the addition of --hotplug invokes the changes put forth in the
kexec-tools patch.

Regards,
eric
---
v14: 16nov2022
- Rebased onto 6.1.0-rc5
- Introduced CRASH_HOTPLUG Kconfig item to better fine tune
compilation of feature components, per Boris.
- Removed hp_action parameter to arch_crash_handle_hotplug_event()
as it is unused.

But what if an architecture has different actions based on hp type?

- Sourabh Jain