Re: [patch 22/37] arm64: smp: Switch to hotplug core state synchronization

From: Mark Rutland
Date: Mon Apr 17 2023 - 11:57:12 EST


On Sat, Apr 15, 2023 at 01:44:49AM +0200, Thomas Gleixner wrote:
> Switch to the CPU hotplug core state tracking and synchronization
> mechanim. No functional change intended.
>
> Signed-off-by: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
> Cc: Catalin Marinas <catalin.marinas@xxxxxxx>
> Cc: Will Deacon <will@xxxxxxxxxx>
> Cc: linux-arm-kernel@xxxxxxxxxxxxxxxxxxx

I gave this a spin on arm64 (in a 64-vCPU VM on an M1 host), and it seems to
work fine with a bunch of vCPUs being hotplugged off and on again randomly.

FWIW:

Tested-by: Mark Rutland <mark.rutland@xxxxxxx>

I also hacked the code to have the dying CPU spin forever before the call to
cpuhp_ap_report_dead(). In that case I see a warning, and that we don't call
arch_cpuhp_cleanup_dead_cpu(), and that the CPU is marked as offline (per
/sys/devices/system/cpu/$N/online).

As a tangent/aside, we might need to improve that for confidential compute
architectures, and we might want to generically track cpus which might still be
using kernel text/data. On arm64 we ensure that via our cpu_kill() callback
(which'll use PSCI CPU_AFFINITY_INFO), but I'm not sure if TDX and/or SEV-SNP
have a similar mechanism.

Otherwise, a malicious hypervisor can pause a vCPU just before it leaves the
kernel (e.g. immediately after the arch_cpuhp_cleanup_dead_cpu() call), wait
for a kexec (or resuse of stack memroy), and unpause the vCPU to cause things
to blow up.

Thanks,
Mark.

> ---
> arch/arm64/Kconfig | 1 +
> arch/arm64/include/asm/smp.h | 2 +-
> arch/arm64/kernel/smp.c | 14 +++++---------
> 3 files changed, 7 insertions(+), 10 deletions(-)
>
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -216,6 +216,7 @@ config ARM64
> select HAVE_KPROBES
> select HAVE_KRETPROBES
> select HAVE_GENERIC_VDSO
> + select HOTPLUG_CORE_SYNC_DEAD if HOTPLUG_CPU
> select IRQ_DOMAIN
> select IRQ_FORCED_THREADING
> select KASAN_VMALLOC if KASAN
> --- a/arch/arm64/include/asm/smp.h
> +++ b/arch/arm64/include/asm/smp.h
> @@ -99,7 +99,7 @@ static inline void arch_send_wakeup_ipi_
>
> extern int __cpu_disable(void);
>
> -extern void __cpu_die(unsigned int cpu);
> +static inline void __cpu_die(unsigned int cpu) { }
> extern void cpu_die(void);
> extern void cpu_die_early(void);
>
> --- a/arch/arm64/kernel/smp.c
> +++ b/arch/arm64/kernel/smp.c
> @@ -333,17 +333,13 @@ static int op_cpu_kill(unsigned int cpu)
> }
>
> /*
> - * called on the thread which is asking for a CPU to be shutdown -
> - * waits until shutdown has completed, or it is timed out.
> + * Called on the thread which is asking for a CPU to be shutdown after the
> + * shutdown completed.
> */
> -void __cpu_die(unsigned int cpu)
> +void arch_cpuhp_cleanup_dead_cpu(unsigned int cpu)
> {
> int err;
>
> - if (!cpu_wait_death(cpu, 5)) {
> - pr_crit("CPU%u: cpu didn't die\n", cpu);
> - return;
> - }
> pr_debug("CPU%u: shutdown\n", cpu);
>
> /*
> @@ -370,8 +366,8 @@ void cpu_die(void)
>
> local_daif_mask();
>
> - /* Tell __cpu_die() that this CPU is now safe to dispose of */
> - (void)cpu_report_death();
> + /* Tell cpuhp_bp_sync_dead() that this CPU is now safe to dispose of */
> + cpuhp_ap_report_dead();
>
> /*
> * Actually shutdown the CPU. This must never fail. The specific hotplug
>