[RFC 10/10] arm64: smp: Make __cpu_disable() parallel

From: Pingfan Liu
Date: Sun Aug 21 2022 - 22:17:14 EST


On a dying cpu, take_cpu_down()->__cpu_disable(), which means if the
teardown path supports parallel, __cpu_disable() confront the parallel,
which may ruin cpu_online_mask etc if no extra lock provides the
protection.

At present, the cpumask is protected by cpu_add_remove_lock, that lock
is quite above __cpu_disable(). In order to protect __cpu_disable() from
parrallel in kexec quick reboot path, introducing a local lock
cpumap_lock.

Signed-off-by: Pingfan Liu <kernelfans@xxxxxxxxx>
Cc: Catalin Marinas <catalin.marinas@xxxxxxx>
Cc: Will Deacon <will@xxxxxxxxxx>
Cc: Viresh Kumar <viresh.kumar@xxxxxxxxxx>
Cc: Sudeep Holla <sudeep.holla@xxxxxxx>
Cc: Phil Auld <pauld@xxxxxxxxxx>
Cc: Rob Herring <robh@xxxxxxxxxx>
Cc: Ben Dooks <ben-linux@xxxxxxxxx>
To: linux-arm-kernel@xxxxxxxxxxxxxxxxxxx
To: linux-kernel@xxxxxxxxxxxxxxx
---
arch/arm64/kernel/smp.c | 31 +++++++++++++++++++++++--------
1 file changed, 23 insertions(+), 8 deletions(-)

diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index ffc5d76cf695..fee8879048b0 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -287,6 +287,28 @@ static int op_cpu_disable(unsigned int cpu)
return 0;
}

+static DEFINE_SPINLOCK(cpumap_lock);
+
+static void __cpu_clear_maps(unsigned int cpu)
+{
+ /*
+ * In the case of kexec rebooting, the cpu_add_remove_lock mutex can not protect
+ */
+ if (kexec_in_progress)
+ spin_lock(&cpumap_lock);
+ remove_cpu_topology(cpu);
+ numa_remove_cpu(cpu);
+
+ /*
+ * Take this CPU offline. Once we clear this, we can't return,
+ * and we must not schedule until we're ready to give up the cpu.
+ */
+ set_cpu_online(cpu, false);
+ if (kexec_in_progress)
+ spin_unlock(&cpumap_lock);
+
+}
+
/*
* __cpu_disable runs on the processor to be shutdown.
*/
@@ -299,14 +321,7 @@ int __cpu_disable(void)
if (ret)
return ret;

- remove_cpu_topology(cpu);
- numa_remove_cpu(cpu);
-
- /*
- * Take this CPU offline. Once we clear this, we can't return,
- * and we must not schedule until we're ready to give up the cpu.
- */
- set_cpu_online(cpu, false);
+ __cpu_clear_maps(cpu);
ipi_teardown(cpu);

/*
--
2.31.1