Re: [PATCH v21 2/7] crash: add generic infrastructure for crash hotplug support

From: Eric DeVolder
Date: Tue Apr 18 2023 - 09:56:45 EST




On 4/6/23 18:58, Baoquan He wrote:
On 04/06/23 at 11:10am, Eric DeVolder wrote:


On 4/6/23 06:04, Baoquan He wrote:
On 04/04/23 at 02:03pm, Eric DeVolder wrote:
......
+static void crash_handle_hotplug_event(unsigned int hp_action, unsigned int cpu)
+{
+ struct kimage *image;
+
+ /* Obtain lock while changing crash information */
+ if (!kexec_trylock()) {
+ pr_info("kexec_trylock() failed, elfcorehdr may be inaccurate\n");
+ return;
+ }
+
+ /* Check kdump is not loaded */
+ if (!kexec_crash_image)
+ goto out;
+
+ image = kexec_crash_image;
+
+ if (hp_action == KEXEC_CRASH_HP_ADD_CPU ||
+ hp_action == KEXEC_CRASH_HP_REMOVE_CPU)
+ pr_debug("hp_action %u, cpu %u\n", hp_action, cpu);
+ else
+ pr_debug("hp_action %u\n", hp_action);

Seems we passed in the cpu number just for printing here. Wondering why
we don't print out hot added/removed memory ranges. Is the cpu number
printing necessary?

Baoquan,

Ah, actually until recently it was used to track the 'offlinecpu' in this
function, but tglx pointed out that was un-necessary. That resulted in
dropping the code in this function dealing with offlinecpu, leaving this as
its only use in this function.

The printing of cpu number is not necessary, but helpful; I use it for debugging.

OK, I see. I am not requesting memory range printing, just try to prove
cpu number printing is not so justified. If it's helpful, I am OK with
it. Let's see if other people have concern about this.


I do not plan on adding the memory range printing.


The printing of memory range is also not necessary, but in order to do that,
should we choose to do so, requires passing in the memory range to this
function. This patch series did do this early on, and by v7 I dropped it at
your urging (https://lore.kernel.org/lkml/20220401183040.1624-1-eric.devolder@xxxxxxxxxx/).
At the time, I provided it since I considered this generic infrastructure,
but I could not defend it since x86 didn't need it. However, PPC now needs
this, and is now carrying this as part of PPC support of CRASH_HOTPLUG (https://lore.kernel.org/linuxppc-dev/20230312181154.278900-6-sourabhjain@xxxxxxxxxxxxx/T/#u).

If you'd rather I pickup the memory range handling again, I can do that. I
think I'd likely change this function to be:

void crash_handle_hotplug_event(unsigned int hp_action, unsigned int cpu,
struct memory_notify *mhp);

where on a CPU op the 'cpu' parameter would be valid and 'mhp' NULL, and on a memory op,
the 'mhp' would be valid and 'cpu' parameter invalid(0).

I'd likely then stuff these two parameters into struct kimage so that it can
be utilized by arch-specific handler, if needed.

And of course, would print out the memory range for debug purposes.

Let me know what you think.


I do not plan on adding the memory range handling; I'll let Sourabh do that as he has a use case for it.

As such, I don't see any other request for changes.

Thanks!
eric