Re: [PATCH v2] lkdtm/bugs: add test for panic() with stuck secondary CPUs

From: Stephen Boyd
Date: Fri Sep 22 2023 - 17:37:10 EST


Quoting Mark Rutland (2023-09-21 09:16:34)
> Upon a panic() the kernel will use either smp_send_stop() or
> crash_smp_send_stop() to attempt to stop secondary CPUs via an IPI,
> which may or may not be an NMI. Generally it's preferable that this is an
> NMI so that CPUs can be stopped in as many situations as possible, but
> it's not always possible to provide an NMI, and there are cases where
> CPUs may be unable to handle the NMI regardless.
>
> This patch adds a test for panic() where all other CPUs are stuck with
> interrupts disabled, which can be used to check whether the kernel
> gracefully handles CPUs failing to respond to a stop, and whether NMIs
> actually work to stop CPUs.
>
> For example, on arm64 *without* an NMI, this results in:
>
> | # echo PANIC_STOP_IRQOFF > /sys/kernel/debug/provoke-crash/DIRECT
> | lkdtm: Performing direct entry PANIC_STOP_IRQOFF
> | Kernel panic - not syncing: panic stop irqoff test
> | CPU: 2 PID: 24 Comm: migration/2 Not tainted 6.5.0-rc3-00077-ge6c782389895-dirty #4
> | Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015
> | Stopper: multi_cpu_stop+0x0/0x1a0 <- stop_machine_cpuslocked+0x158/0x1a4
> | Call trace:
> | dump_backtrace+0x94/0xec
> | show_stack+0x18/0x24
> | dump_stack_lvl+0x74/0xc0
> | dump_stack+0x18/0x24
> | panic+0x358/0x3e8
> | lkdtm_PANIC+0x0/0x18
> | multi_cpu_stop+0x9c/0x1a0
> | cpu_stopper_thread+0x84/0x118
> | smpboot_thread_fn+0x224/0x248
> | kthread+0x114/0x118
> | ret_from_fork+0x10/0x20
> | SMP: stopping secondary CPUs
> | SMP: failed to stop secondary CPUs 0-3
> | Kernel Offset: 0x401cf3490000 from 0xffff80008000000c0
> | PHYS_OFFSET: 0x40000000
> | CPU features: 0x00000000,68c167a1,cce6773f
> | Memory Limit: none
> | ---[ end Kernel panic - not syncing: panic stop irqoff test ]---
>
> Note the "failed to stop secondary CPUs 0-3" message.
>
> On arm64 *with* an NMI, this results in:
>
> | # echo PANIC_STOP_IRQOFF > /sys/kernel/debug/provoke-crash/DIRECT
> | lkdtm: Performing direct entry PANIC_STOP_IRQOFF
> | Kernel panic - not syncing: panic stop irqoff test
> | CPU: 1 PID: 19 Comm: migration/1 Not tainted 6.5.0-rc3-00077-ge6c782389895-dirty #4
> | Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015
> | Stopper: multi_cpu_stop+0x0/0x1a0 <- stop_machine_cpuslocked+0x158/0x1a4
> | Call trace:
> | dump_backtrace+0x94/0xec
> | show_stack+0x18/0x24
> | dump_stack_lvl+0x74/0xc0
> | dump_stack+0x18/0x24
> | panic+0x358/0x3e8
> | lkdtm_PANIC+0x0/0x18
> | multi_cpu_stop+0x9c/0x1a0
> | cpu_stopper_thread+0x84/0x118
> | smpboot_thread_fn+0x224/0x248
> | kthread+0x114/0x118
> | ret_from_fork+0x10/0x20
> | SMP: stopping secondary CPUs
> | Kernel Offset: 0x55a9c0bc0000 from 0xffff800080000000
> | PHYS_OFFSET: 0x40000000
> | CPU features: 0x00000000,68c167a1,fce6773f
> | Memory Limit: none
> | ---[ end Kernel panic - not syncing: panic stop irqoff test ]---
>
> Note the absence of a "failed to stop secondary CPUs" message, since we
> don't log anything when secondary CPUs are successfully stopped.
>
> Signed-off-by: Mark Rutland <mark.rutland@xxxxxxx>
> Cc: Douglas Anderson <dianders@xxxxxxxxxxxx>
> Cc: Kees Cook <keescook@xxxxxxxxxxxx>
> Cc: Stephen Boyd <swboyd@xxxxxxxxxxxx
> Cc: Sumit Garg <sumit.garg@xxxxxxxxxx>
> Reviewed-by: Kees Cook <keescook@xxxxxxxxxxxx>
> Reviewed-by: Douglas Anderson <dianders@xxxxxxxxxxxx>
> ---

Reviewed-by: Stephen Boyd <swboyd@xxxxxxxxxxxx>