Re: [PATCH 3/3 -mm] generic-ipi: fix the race betweengeneric_smp_call_function_*() and hotplug_cfd()

From: Andrew Morton
Date: Wed Jul 29 2009 - 19:31:49 EST


On Wed, 29 Jul 2009 15:57:51 +0800
Xiao Guangrong <xiaoguangrong@xxxxxxxxxxxxxx> wrote:

> It have race between generic_smp_call_function_*() and hotplug_cfd()
> in many cases, see below examples:
>
> 1: hotplug_cfd() can free cfd->cpumask, the system will crash if the
> cpu's cfd still in the call_function list:
>
>
> CPU A: CPU B
>
> smp_call_function_many() ......
> cpu_down() ......
> hotplug_cfd() -> ......
> free_cpumask_var(cfd->cpumask) (receive function IPI interrupte)
> /* read cfd->cpumask */
> generic_smp_call_function_interrupt() ->
> cpumask_test_and_clear_cpu(cpu, data->cpumask)
>
> CRASH!!!
>
> 2: It's not handle call_function list when cpu down, It's will lead to
> dead-wait if other path is waiting this cpu to execute function
>
> CPU A: CPU B
>
> smp_call_function_many(wait=0)
> ...... CPU B down
> smp_call_function_many() --> (cpu down before recevie function
> csd_lock(&data->csd); IPI interrupte)
>
> DEAD-WAIT!!!!
>
> So, CPU A will dead-wait in csd_lock(), the same as
> smp_call_function_single()
>
> Signed-off-by: Xiao Guangrong <xiaoguangrong@xxxxxxxxxxxxxx>
> ---
> kernel/smp.c | 140 ++++++++++++++++++++++++++++++++-------------------------
> 1 files changed, 79 insertions(+), 61 deletions(-)
>

It was unfortunate that this patch moved a screenful of code around and
changed that code at the same time - it makes it hard to see what the
functional change was.

So I split this patch into two. The first patch simply moves
hotplug_cfd() to the end of the file and the second makes the
functional changes. The second patch is below, for easier review.

Do we think that this patch should be merged into 2.6.31? 2.6.30.x?



From: Xiao Guangrong <xiaoguangrong@xxxxxxxxxxxxxx>

There is a race between generic_smp_call_function_*() and hotplug_cfd() in
many cases, see below examples:

1: hotplug_cfd() can free cfd->cpumask, the system will crash if the
cpu's cfd still in the call_function list:


CPU A: CPU B

smp_call_function_many() ......
cpu_down() ......
hotplug_cfd() -> ......
free_cpumask_var(cfd->cpumask) (receive function IPI interrupte)
/* read cfd->cpumask */
generic_smp_call_function_interrupt() ->
cpumask_test_and_clear_cpu(cpu, data->cpumask)

CRASH!!!

2: It's not handle call_function list when cpu down, It's will lead to
dead-wait if other path is waiting this cpu to execute function

CPU A: CPU B

smp_call_function_many(wait=0)
...... CPU B down
smp_call_function_many() --> (cpu down before recevie function
csd_lock(&data->csd); IPI interrupte)

DEAD-WAIT!!!!

So, CPU A will dead-wait in csd_lock(), the same as
smp_call_function_single()

Signed-off-by: Xiao Guangrong <xiaoguangrong@xxxxxxxxxxxxxx>
Cc: Ingo Molnar <mingo@xxxxxxx>
Cc: Jens Axboe <jens.axboe@xxxxxxxxxx>
Cc: Nick Piggin <nickpiggin@xxxxxxxxxxxx>
Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
Cc: Rusty Russell <rusty@xxxxxxxxxxxxxxx>
Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
---

kernel/smp.c | 38 ++++++++++++++++++++++++++++----------
1 file changed, 28 insertions(+), 10 deletions(-)

diff -puN kernel/smp.c~generic-ipi-fix-the-race-between-generic_smp_call_function_-and-hotplug_cfd kernel/smp.c
--- a/kernel/smp.c~generic-ipi-fix-the-race-between-generic_smp_call_function_-and-hotplug_cfd
+++ a/kernel/smp.c
@@ -116,14 +116,10 @@ void generic_exec_single(int cpu, struct
csd_lock_wait(data);
}

-/*
- * Invoked by arch to handle an IPI for call function. Must be called with
- * interrupts disabled.
- */
-void generic_smp_call_function_interrupt(void)
+static void
+__generic_smp_call_function_interrupt(int cpu, int run_callbacks)
{
struct call_function_data *data;
- int cpu = smp_processor_id();

/*
* Ensure entry is visible on call_function_queue after we have
@@ -169,12 +165,18 @@ void generic_smp_call_function_interrupt
}

/*
- * Invoked by arch to handle an IPI for call function single. Must be
- * called from the arch with interrupts disabled.
+ * Invoked by arch to handle an IPI for call function. Must be called with
+ * interrupts disabled.
*/
-void generic_smp_call_function_single_interrupt(void)
+void generic_smp_call_function_interrupt(void)
+{
+ __generic_smp_call_function_interrupt(smp_processor_id(), 1);
+}
+
+static void
+__generic_smp_call_function_single_interrupt(int cpu, int run_callbacks)
{
- struct call_single_queue *q = &__get_cpu_var(call_single_queue);
+ struct call_single_queue *q = &per_cpu(call_single_queue, cpu);
unsigned int data_flags;
LIST_HEAD(list);

@@ -205,6 +207,15 @@ void generic_smp_call_function_single_in
}
}

+/*
+ * Invoked by arch to handle an IPI for call function single. Must be
+ * called from the arch with interrupts disabled.
+ */
+void generic_smp_call_function_single_interrupt(void)
+{
+ __generic_smp_call_function_single_interrupt(smp_processor_id(), 1);
+}
+
static DEFINE_PER_CPU(struct call_single_data, csd_data);

/*
@@ -456,6 +467,7 @@ static int
hotplug_cfd(struct notifier_block *nfb, unsigned long action, void *hcpu)
{
long cpu = (long)hcpu;
+ unsigned long flags;
struct call_function_data *cfd = &per_cpu(cfd_data, cpu);

switch (action) {
@@ -472,6 +484,12 @@ hotplug_cfd(struct notifier_block *nfb,

case CPU_DEAD:
case CPU_DEAD_FROZEN:
+ local_irq_save(flags);
+ __generic_smp_call_function_interrupt(cpu, 0);
+ __generic_smp_call_function_single_interrupt(cpu, 0);
+ local_irq_restore(flags);
+
+ csd_lock_wait(&cfd->csd);
free_cpumask_var(cfd->cpumask);
break;
#endif
_

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/