Re: [PATCH v21 2/7] crash: add generic infrastructure for crash hotplug support

From: Sourabh Jain
Date: Wed Apr 12 2023 - 04:59:25 EST



On 06/04/23 21:40, Eric DeVolder wrote:


On 4/6/23 06:04, Baoquan He wrote:
On 04/04/23 at 02:03pm, Eric DeVolder wrote:
......
+static void crash_handle_hotplug_event(unsigned int hp_action, unsigned int cpu)
+{
+    struct kimage *image;
+
+    /* Obtain lock while changing crash information */
+    if (!kexec_trylock()) {
+        pr_info("kexec_trylock() failed, elfcorehdr may be inaccurate\n");
+        return;
+    }
+
+    /* Check kdump is not loaded */
+    if (!kexec_crash_image)
+        goto out;
+
+    image = kexec_crash_image;
+
+    if (hp_action == KEXEC_CRASH_HP_ADD_CPU ||
+        hp_action == KEXEC_CRASH_HP_REMOVE_CPU)
+        pr_debug("hp_action %u, cpu %u\n", hp_action, cpu);
+    else
+        pr_debug("hp_action %u\n", hp_action);

Seems we passed in the cpu number just for printing here. Wondering why
we don't print out hot added/removed memory ranges. Is the cpu number
printing necessary?

Baoquan,

Ah, actually until recently it was used to track the 'offlinecpu' in this function, but tglx pointed out that was un-necessary. That resulted in dropping the code in this function dealing with offlinecpu, leaving this as its only use in this function.

The printing of cpu number is not necessary, but helpful; I use it for debugging.

The printing of memory range is also not necessary, but in order to do that, should we choose to do so, requires passing in the memory range to this function. This patch series did do this early on, and by v7 I dropped it at your urging (https://lore.kernel.org/lkml/20220401183040.1624-1-eric.devolder@xxxxxxxxxx/). At the time, I provided it since I considered this generic infrastructure, but I could not defend it since x86 didn't need it. However, PPC now needs this, and is now carrying this as part of PPC support of CRASH_HOTPLUG (https://lore.kernel.org/linuxppc-dev/20230312181154.278900-6-sourabhjain@xxxxxxxxxxxxx/T/#u).

If you'd rather I pickup the memory range handling again, I can do that. I think I'd likely change this function to be:

  void crash_handle_hotplug_event(unsigned int hp_action, unsigned int cpu,
     struct memory_notify *mhp);

where on a CPU op the 'cpu' parameter would be valid and 'mhp' NULL, and on a memory op,
the 'mhp' would be valid and 'cpu' parameter invalid(0).

I'd likely then stuff these two parameters into struct kimage so that it can be utilized by arch-specific handler, if needed.

And of course, would print out the memory range for debug purposes.

I think passing memory_notify as parameter is a better approach compare to adding the
same into struct kimage. Because once the crash hotplug event is served the memory_notify
object is not useful.

- Sourabh