Re: [patch 1/2] cpu/hotplug: Prevent crash when CPU bringup fails on CONFIG_HOTPLUG_CPU=n

From: Greg KH
Date: Tue Mar 26 2019 - 21:12:52 EST


On Tue, Mar 26, 2019 at 05:36:05PM +0100, Thomas Gleixner wrote:
> Tianyu reported a crash in a CPU hotplug teardown callback when booting a
> kernel which has CONFIG_HOTPLUG_CPU disabled with the 'nosmt' boot
> parameter.
>
> It turns out that the SMP=y CONFIG_HOTPLUG_CPU=n case has been broken
> forever in case that a bringup callback fails. Unfortunately this issue was
> not recognized when the CPU hotplug code was reworked, so the shortcoming
> just stayed in place.
>
> When a bringup callback fails, the CPU hotplug code rolls back the
> operation and takes the CPU offline.
>
> The 'nosmt' command line argument uses a bringup failure to abort the
> bringup of SMT sibling CPUs. This partial bringup is required due to the
> MCE misdesign on Intel CPUs.
>
> With CONFIG_HOTPLUG_CPU=y the rollback works perfectly fine, but
> CONFIG_HOTPLUG_CPU=n lacks essential mechanisms to exercise the low level
> teardown of a CPU including the synchronizations in various facilities like
> RCU, NOHZ and others.
>
> As a consequence the teardown callbacks which must be executed on the
> outgoing CPU within stop machine with interrupts disabled are executed on
> the control CPU in interrupt enabled and preemptible context causing the
> kernel to crash and burn. The pre state machine code has a different
> failure mode which is more subtle and resulting in a less obvious use after
> free crash because the control side frees resources which are still in use
> by the undead CPU.
>
> But this is not a x86 only problem. Any architecture which supports the
> SMP=y HOTPLUG_CPU=n combination suffers from the same issue. It's just less
> likely to be triggered because in 99.99999% of the cases all bringup
> callbacks succeed.
>
> The easy solution of making HOTPLUG_CPU mandatory for SMP is not working on
> all architectures as the following architectures have either no hotplug
> support at all or not all subarchitectures support it:
>
> alpha, arc, hexagon, openrisc, riscv, sparc (32bit), mips (partial).
>
> Crashing the kernel in such a situation is not an acceptable state
> either.
>
> Implement a minimal rollback variant by limiting the teardown to the point
> where all regular teardown callbacks have been invoked and leave the CPU in
> the 'dead' idle state. This has the following consequences:
>
> - the CPU is brought down to the point where the stop_machine takedown
> would happen.
>
> - the CPU stays there forever and is idle
>
> - The CPU is cleared in the CPU active mask, but not in the CPU online
> mask which is a legit state.
>
> - Interrupts are not forced away from the CPU
>
> - All facilities which only look at online mask would still see it, but
> that is the case during normal hotplug/unplug operations as well. It's
> just a (way) longer time frame.
>
> This will expose issues, which haven't been exposed before or only seldom,
> because now the normally transient state of being non active but online is
> a permanent state. In testing this exposed already an issue vs. work queues
> where the vmstat code schedules work on the almost dead CPU which ends up
> in an unbound workqueue and triggers 'preemtible context' warnings. This is
> not a problem of this change, it merily exposes an already existing issue.
> Still this is better than crashing fully without a chance to debug it.
>
> This is mainly thought as workaround for those architectures which do not
> support HOTPLUG_CPU. All others should enforce HOTPLUG_CPU for SMP.
>
> Fixes: 2e1a3483ce74 ("cpu/hotplug: Split out the state walk into functions")
> Reported-by: Tianyu Lan <Tianyu.Lan@xxxxxxxxxxxxx>
> Signed-off-by: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
> Tested-by: Tianyu Lan <Tianyu.Lan@xxxxxxxxxxxxx>
> Cc: Konrad Wilk <konrad.wilk@xxxxxxxxxx>
> Cc: Josh Poimboeuf <jpoimboe@xxxxxxxxxx>
> Cc: Mukesh Ojha <mojha@xxxxxxxxxxxxxx>
> Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
> Cc: Jiri Kosina <jkosina@xxxxxxx>
> Cc: Rik van Riel <riel@xxxxxxxxxxx>
> Cc: Andy Lutomirski <luto@xxxxxxxxxx>
> Cc: Micheal Kelley <michael.h.kelley@xxxxxxxxxxxxx>
> Cc: K. Y. Srinivasan <kys@xxxxxxxxxxxxx>
> Cc: Greg KH <gregkh@xxxxxxxxxxxxxxxxxxx>
> Cc: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
> Cc: Borislav Petkov <bp@xxxxxxxxx>
> Cc: stable@xxxxxxxxxxxxxxx
> ---
> kernel/cpu.c | 20 ++++++++++++++++++--
> 1 file changed, 18 insertions(+), 2 deletions(-)

Acked-by: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx>