Re: [PATCH v12 7/7] x86/crash: Add x86 crash hotplug support

From: Eric DeVolder
Date: Fri Sep 30 2022 - 11:39:48 EST




On 9/28/22 11:07, Borislav Petkov wrote:
On Tue, Sep 13, 2022 at 02:12:31PM -0500, Eric DeVolder wrote:
This topic was discussed previously https://lkml.org/lkml/2022/3/3/372.

Please do not use lkml.org to refer to lkml messages. We have a
perfectly fine archival system at lore.kernel.org. You simply do

https://lore.kernel.org/r/<Message-ID>

when you want to point to a previous mail.

ok, thanks for pointing that out to me.

David points out that terminology is tricky here due to differing behaviors.
And perhaps that is your point in asking for guidance text. It can be
complicated

Which means you need an explanation how to use this even more.

And why is CONFIG_CRASH_MAX_MEMORY_RANGES even a Kconfig item and not
something you discover from the hardware?

No, is the short answer.


Your help text talks about System RAM entries in /proc/iomem which means
that those entries are present somewhere in the kernel and you can read
them out and do the proper calculations dynamically instead of doing the
static CONFIG_NR_CPUS_DEFAULT + CONFIG_CRASH_MAX_MEMORY_RANGES thing.

The intent is to compute the max size buffer needed to contain a maximum populated elfcorehdr, which is primarily based on the number of CPUs and memory regions. Thus far I (and others involved) have not found a kernel method to determine the maximum number of memory regions possible (if you are aware of one, please let me know!). Thus CONFIG_CRASH_MAX_MEMORY_RANGES was born (rather borrowed from kexec-tools).

So no dynamic computation is possible, yet.


, but it all comes down to System RAM entries.

I could perhaps offer an overly simplified example such that for 1GiB block
size, for example, the CRASH_MAX_MEMORY_RANGES of 32768 would allow for 32TiB
of memory?

Yes, and stick it somewhere in Documentation/admin-guide/kdump/ and
refer to it in that help text so that people can find it and read how to
use your new option.

ok

The kbuf.bufsz value is obtained via a call to prepare_elf_headers(); I can
not initialize it at its declaration.

Sorry, I meant this:

diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
index 8fc7d678ac72..ee6fd9f1b2b9 100644
--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -395,8 +395,9 @@ int crash_load_segments(struct kimage *image)
if (ret)
return ret;
- image->elf_headers = kbuf.buffer;
- image->elf_headers_sz = kbuf.bufsz;
+ image->elf_headers = kbuf.buffer;
+ image->elf_headers_sz = kbuf.bufsz;
+ kbuf.memsz = kbuf.bufsz;
#if defined(CONFIG_HOTPLUG_CPU) || defined(CONFIG_MEMORY_HOTPLUG)
/* Ensure elfcorehdr segment large enough for hotplug changes */
@@ -407,9 +408,8 @@ int crash_load_segments(struct kimage *image)
image->elf_headers_sz = kbuf.memsz;
image->elfcorehdr_index = image->nr_segments;
image->elfcorehdr_index_valid = true;
-#else
- kbuf.memsz = kbuf.bufsz;
#endif
+
kbuf.buf_align = ELF_CORE_HEADER_ALIGN;
kbuf.mem = KEXEC_BUF_MEM_UNKNOWN;
ret = kexec_add_buffer(&kbuf);

ok

I'm at a loss as to what to do differently here. You've raised this issue
before and I went back and looked at the suggestions then and I don't see
how that applies to this situation. How is this situation different than the
#ifdef CONFIG_KEXEC_FILE that immediately preceeds it?

See the diff at the end. I'm not saying this is how you should do it
but it should give you a better idea. The logic being, the functions
in the .c file don't really need ifdeffery around them - you're adding
1-2 functions and crash.c is not that big - so they can be built in
unconditionally. You'd need the ifdeffery *in the header only* when
crash.c is not being built.
ok; I've overlooked that scenario.

But I've done it with ifdeffery in the .c file now because yes, the
kexec code is a minefield of ifdeffery. Hell, there's ifdeffery even in
the headers for structs. Ifdeffery you don't really need. Someone should
clean that up and simplify this immensely.

ok


Currently there is a concurrent effort for PPC support by Sourabh
Jain, and in that effort arch_map_crash_pages() is using __va(paddr).

Why?

I do not know the nuances between kmap_local_page() and __va() to
answer the question.

kmap_local_page() is a generic interface and it should work on any arch.

And it is documented even:

$ git grep kmap_local_page Documentation/

If kmap_local_page() works for all archs, then I'm happy to drop these
arch_ variants and use it directly.

Yes, pls do.

I'll check with Sourabh to see if PPC can work with kmap_local_page().


---

diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h
index 432073385b2d..b73c9628cd85 100644
--- a/arch/x86/include/asm/kexec.h
+++ b/arch/x86/include/asm/kexec.h
@@ -205,6 +205,17 @@ void *arch_kexec_kernel_image_load(struct kimage *image);
int arch_kimage_file_post_load_cleanup(struct kimage *image);
#define arch_kimage_file_post_load_cleanup arch_kimage_file_post_load_cleanup
+
+#ifdef CONFIG_CRASH_MAX_MEMORY_RANGES
So I think the use of CONFIG_CRASH_MAX_MEMORY_RANGES is not correct; it still needs to be based on the cpu or memory hotplug options.

+void *arch_map_crash_pages(unsigned long paddr, unsigned long size);
+void arch_unmap_crash_pages(void **ptr);
+void arch_crash_handle_hotplug_event(struct kimage *image, unsigned int hp_action);
+#else
+void *arch_map_crash_pages(unsigned long paddr, unsigned long size) { return NULL; }
+void arch_unmap_crash_pages(void **ptr) { }
+void arch_crash_handle_hotplug_event(struct kimage *image, unsigned int hp_action) { }
+#endif
+
#endif
#endif
diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
index 8fc7d678ac72..a526c893abe8 100644
--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -395,8 +395,9 @@ int crash_load_segments(struct kimage *image)
if (ret)
return ret;
- image->elf_headers = kbuf.buffer;
- image->elf_headers_sz = kbuf.bufsz;
+ image->elf_headers = kbuf.buffer;
+ image->elf_headers_sz = kbuf.bufsz;
+ kbuf.memsz = kbuf.bufsz;
#if defined(CONFIG_HOTPLUG_CPU) || defined(CONFIG_MEMORY_HOTPLUG)
/* Ensure elfcorehdr segment large enough for hotplug changes */
@@ -407,9 +408,8 @@ int crash_load_segments(struct kimage *image)
image->elf_headers_sz = kbuf.memsz;
image->elfcorehdr_index = image->nr_segments;
image->elfcorehdr_index_valid = true;
-#else
- kbuf.memsz = kbuf.bufsz;
#endif
+
kbuf.buf_align = ELF_CORE_HEADER_ALIGN;
kbuf.mem = KEXEC_BUF_MEM_UNKNOWN;
ret = kexec_add_buffer(&kbuf);
@@ -425,7 +425,8 @@ int crash_load_segments(struct kimage *image)
}
#endif /* CONFIG_KEXEC_FILE */
-#if defined(CONFIG_HOTPLUG_CPU) || defined(CONFIG_MEMORY_HOTPLUG)
+#ifdef CONFIG_CRASH_MAX_MEMORY_RANGES
Again, I don't think CONFIG_CRASH_MAX_MEMORY_RANGES makes sense, at all.

+
/*
* NOTE: The addresses and sizes passed to this routine have
* already been fully aligned on page boundaries. There is no
@@ -462,8 +463,7 @@ void arch_unmap_crash_pages(void **ptr)
* is prepared in a kernel buffer, and then it is written on top
* of the existing/old elfcorehdr.
*/
-void arch_crash_handle_hotplug_event(struct kimage *image,
- unsigned int hp_action)
+void arch_crash_handle_hotplug_event(struct kimage *image, unsigned int hp_action)
{
struct kexec_segment *ksegment;
unsigned char *ptr = NULL;
@@ -513,4 +513,5 @@ void arch_crash_handle_hotplug_event(struct kimage *image,
if (elfbuf)
vfree(elfbuf);
}
-#endif
+
+#endif /* CONFIG_CRASH_MAX_MEMORY_RANGES */
diff --git a/include/linux/kexec.h b/include/linux/kexec.h
index a48577a36fb8..0f79ad4c4f80 100644
--- a/include/linux/kexec.h
+++ b/include/linux/kexec.h
@@ -27,6 +27,19 @@ extern struct resource crashk_res;
extern struct resource crashk_low_res;
extern note_buf_t __percpu *crash_notes;
+/* Alignment required for elf header segment */
+#define ELF_CORE_HEADER_ALIGN 4096
+
+struct crash_mem_range {
+ u64 start, end;
+};
+
+struct crash_mem {
+ unsigned int max_nr_ranges;
+ unsigned int nr_ranges;
+ struct crash_mem_range ranges[];
+};
+
#ifdef CONFIG_KEXEC_CORE
#include <linux/list.h>
#include <linux/compat.h>
@@ -237,19 +250,6 @@ static inline int arch_kexec_locate_mem_hole(struct kexec_buf *kbuf)
}
#endif
-/* Alignment required for elf header segment */
-#define ELF_CORE_HEADER_ALIGN 4096
-
-struct crash_mem_range {
- u64 start, end;
-};
-
-struct crash_mem {
- unsigned int max_nr_ranges;
- unsigned int nr_ranges;
- struct crash_mem_range ranges[];
-};
-
extern int crash_exclude_mem_range(struct crash_mem *mem,
unsigned long long mstart,
unsigned long long mend);
diff --git a/kernel/crash_core.c b/kernel/crash_core.c
index 5bc5159d9cb1..f6b5d835f826 100644
--- a/kernel/crash_core.c
+++ b/kernel/crash_core.c
@@ -622,6 +622,15 @@ static int __init crash_save_vmcoreinfo_init(void)
subsys_initcall(crash_save_vmcoreinfo_init);
#if defined(CONFIG_HOTPLUG_CPU) || defined(CONFIG_MEMORY_HOTPLUG)
+
+void __weak *arch_map_crash_pages(unsigned long paddr, unsigned long size)
+{
+ return NULL;
+}
+
+void __weak arch_unmap_crash_pages(void **ptr) { }
+void __weak arch_crash_handle_hotplug_event(struct kimage *image, unsigned int hp_action) { }
+
I was asked by Baoquan He to eliminate the use of __weak, which I did. I followed the technique used by other kexec infrastructure.


/*
* To accurately reflect hot un/plug changes, the elfcorehdr (which
* is passed to the crash kernel via the elfcorehdr= parameter)