[PATCH v1] panic: push panic() messages to the console even from the MCE nmi handler

From: “William Roche
Date: Mon Jan 04 2021 - 16:20:05 EST


From: William Roche <william.roche@xxxxxxxxxx>

Force push panic messages to the console as panic() can be called from NMI
interrupt handler functions where printed messages can't always reach the
console without an explicit push provided by printk_safe_flush_on_panic()
and console_flush_on_panic().
This is the case with the MCE handler that can lead to a system panic
giving information on the fatal MCE root cause that must reach the console.

Signed-off-by: William Roche <william.roche@xxxxxxxxxx>
---

Notes:
While testing MCE injection and kernel reaction, we discovered a bug
in the way the kernel provides the panic reason information: When dealing
with fatal MCE, the machine (physical or virtual) can reboot without
leaving any message on the console.

This behavior can be reproduced on Intel with the mce-inject tool
with a simple:
# modprobe mce-inject
# mce-inject test/uncorrected

The investigations showed that the MCE panic can be totally message-less
or can give a small set of messages. This behavior depends on the use of the
crash_kexec mechanism (using the "crashkernel" parameter). Not using this
parameter, we get a partial [Hardware Error] information on panic, but some
important notifications can be missing. And when using it, a fatal MCE can
panic the system without leaving any information.

. Without "crashkernel", a Fatal MCE injection shows:

[ 212.153928] mce: Machine check injector initialized
[ 236.730682] mce: Triggering MCE exception on CPU 0
[ 236.731304] Disabling lock debugging due to kernel taint
[ 236.731947] mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 1: b000000000000000
[ 236.731948] mce: [Hardware Error]: TSC 78418fb4a83f
[ 236.731949] mce: [Hardware Error]: PROCESSOR 0:406f1 TIME 1605312952 SOCKET 0 APIC 0 microcode 1
[ 236.731949] mce: [Hardware Error]: Run the above through 'mcelog --ascii'
[ 236.731950] mce: [Hardware Error]: Machine check: MCIP not set in MCA handler
[ 236.731950] Kernel panic - not syncing: Fatal machine check
[ 236.732047] Kernel Offset: disabled

The system hangs 30 seconds without any additional message, and finally
reboots.

. With the use of "crashkernel", a Fatal MCE injection shows only the
injection message

[ 80.811708] mce: Machine check injector initialized
[ 92.298755] mce: Triggering MCE exception on CPU 0
[ 92.299362] Disabling lock debugging due to kernel taint

No other messages is displayed and the system reboots immediately.

The reason behind this defective behavior is the fact that
do_machine_check() MCE handler can call mce_panic() to finish in panic()
trying to print out messages on the console. So we end up trying to print
out messages directly from this interrupt handler, which can require the
help of additional function calls like printk_safe_flush_on_panic() and
console_flush_on_panic().

With the suggested fix here, the above output turns into the expected:

. Without crashkernel:
[ 92.025010] mce: Machine check injector initialized
[ 113.006798] mce: Triggering MCE exception on CPU 0
[ 113.007391] Disabling lock debugging due to kernel taint
[ 113.008061] mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 1: b000000000000000
[ 113.008062] mce: [Hardware Error]: TSC 2ec380a0f187
[ 113.008063] mce: [Hardware Error]: PROCESSOR 0:406f1 TIME 1606223541 SOCKET 0 APIC 0 microcode 1
[ 113.008063] mce: [Hardware Error]: Run the above through 'mcelog --ascii'
[ 113.008064] mce: [Hardware Error]: Machine check: MCIP not set in MCA handler
[ 113.008064] Kernel panic - not syncing: Fatal machine check
[ 113.013010] Kernel Offset: disabled
[ 113.013426] Rebooting in 30 seconds..

Note that we now have the important information about the Reboot in
30 seconds.

. With crashkernel:
[ 75.770143] mce: Machine check injector initialized
[ 100.923650] mce: Triggering MCE exception on CPU 0
[ 100.924415] Disabling lock debugging due to kernel taint
[ 100.925094] mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 1: b000000000000000
[ 100.925095] mce: [Hardware Error]: TSC 2f3aa07b03de
[ 100.925096] mce: [Hardware Error]: PROCESSOR 0:406f1 TIME 1606223692 SOCKET 0 APIC 0 microcode 1
[ 100.925097] mce: [Hardware Error]: Run the above through 'mcelog --ascii'
[ 100.925098] mce: [Hardware Error]: Machine check: MCIP not set in MCA handler
[ 100.925100] Kernel panic - not syncing: Fatal machine check

Here the reboot is immediate but the MCE error message reaches the
console.

Could you please review this fix ?

Thanks in advance for any feedback you could have.
William.

kernel/panic.c | 18 ++++++++++++++----
1 file changed, 14 insertions(+), 4 deletions(-)

diff --git a/kernel/panic.c b/kernel/panic.c
index 332736a..eb90cc0 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -166,6 +166,15 @@ static void panic_print_sys_info(void)
ftrace_dump(DUMP_ALL);
}

+/*
+ * Force flush messages to the console.
+ */
+static void panic_flush_to_console(void)
+{
+ printk_safe_flush_on_panic();
+ console_flush_on_panic(CONSOLE_FLUSH_PENDING);
+}
+
/**
* panic - halt the system
* @fmt: The text string to print
@@ -247,7 +256,7 @@ void panic(const char *fmt, ...)
* Bypass the panic_cpu check and call __crash_kexec directly.
*/
if (!_crash_kexec_post_notifiers) {
- printk_safe_flush_on_panic();
+ panic_flush_to_console();
__crash_kexec(NULL);

/*
@@ -271,9 +280,8 @@ void panic(const char *fmt, ...)
*/
atomic_notifier_call_chain(&panic_notifier_list, 0, buf);

- /* Call flush even twice. It tries harder with a single online CPU */
- printk_safe_flush_on_panic();
kmsg_dump(KMSG_DUMP_PANIC);
+ panic_flush_to_console();

/*
* If you doubt kdump always works fine in any situation,
@@ -298,7 +306,7 @@ void panic(const char *fmt, ...)
* buffer. Try to acquire the lock then release it regardless of the
* result. The release will also print the buffers out. Locks debug
* should be disabled to avoid reporting bad unlock balance when
- * panic() is not being callled from OOPS.
+ * panic() is not being called from OOPS.
*/
debug_locks_off();
console_flush_on_panic(CONSOLE_FLUSH_PENDING);
@@ -314,6 +322,7 @@ void panic(const char *fmt, ...)
* We can't use the "normal" timers since we just panicked.
*/
pr_emerg("Rebooting in %d seconds..\n", panic_timeout);
+ panic_flush_to_console();

for (i = 0; i < panic_timeout * 1000; i += PANIC_TIMER_STEP) {
touch_nmi_watchdog();
@@ -347,6 +356,7 @@ void panic(const char *fmt, ...)
disabled_wait();
#endif
pr_emerg("---[ end Kernel panic - not syncing: %s ]---\n", buf);
+ panic_flush_to_console();

/* Do not scroll important messages printed above */
suppress_printk = 1;
--
1.8.3.1