Re: qemu:arm test failure due to commit 8053871d0f7f (smp: Fix smp_call_function_single_async() locking)

From: Rabin Vincent
Date: Sat Apr 18 2015 - 23:40:10 EST


On Sat, Apr 18, 2015 at 06:56:02PM -0700, Guenter Roeck wrote:
> Further debugging (with added WARN_ON if cpu != 0 in smp_call_function_single) shows:
>
> [<800157ec>] (unwind_backtrace) from [<8001250c>] (show_stack+0x10/0x14)
> [<8001250c>] (show_stack) from [<80494cb4>] (dump_stack+0x88/0x98)
> [<80494cb4>] (dump_stack) from [<80024058>] (warn_slowpath_common+0x84/0xb4)
> [<80024058>] (warn_slowpath_common) from [<80024124>] (warn_slowpath_null+0x1c/0x24)
> [<80024124>] (warn_slowpath_null) from [<80078fc8>] (smp_call_function_single+0x170/0x178)
> [<80078fc8>] (smp_call_function_single) from [<80090024>] (perf_event_exit_cpu+0x80/0xf0)
> [<80090024>] (perf_event_exit_cpu) from [<80090110>] (perf_cpu_notify+0x30/0x48)
> [<80090110>] (perf_cpu_notify) from [<8003d340>] (notifier_call_chain+0x44/0x84)
> [<8003d340>] (notifier_call_chain) from [<8002451c>] (_cpu_up+0x120/0x168)
> [<8002451c>] (_cpu_up) from [<800245d4>] (cpu_up+0x70/0x94)
> [<800245d4>] (cpu_up) from [<80624234>] (smp_init+0xac/0xb0)
> [<80624234>] (smp_init) from [<80618d84>] (kernel_init_freeable+0x118/0x268)
> [<80618d84>] (kernel_init_freeable) from [<8049107c>] (kernel_init+0x8/0xe8)
> [<8049107c>] (kernel_init) from [<8000f320>] (ret_from_fork+0x14/0x34)
> ---[ end trace 2f9f1bb8a47b3a1b ]---
> smp_call_function_single, cpu=1, wait=1, csd_stack=87825ea0
> generic_exec_single, cpu=1, smp_processor_id()=0
> csd_lock_wait: csd=87825ea0, flags=0x3
>
> This is repeated for each secondary CPU. But the secondary CPUs don't respond because
> they are not enabled, which I guess explains why the lock is never released.
>
> So, in other words, this happens because the system believes (presumably per configuration
> / fdt data) that there are four CPU cores, but that is not really the case. Previously that
> did not matter, and was handled correctly. Now it is fatal.
>
> Does this help ?

8053871d0f7f67c7efb7f226ef031f78877d6625 moved the csd locking to the
callers, but the offline check, which was earlier done before the csd
locking, was not moved. The following moves the checks to the earlier
point fixes your boot:

diff --git a/kernel/smp.c b/kernel/smp.c
index 2aaac2c..ba1fb01 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -159,9 +159,6 @@ static int generic_exec_single(int cpu, struct call_single_data *csd,
}


- if ((unsigned)cpu >= nr_cpu_ids || !cpu_online(cpu))
- return -ENXIO;
-
csd->func = func;
csd->info = info;

@@ -289,6 +286,12 @@ int smp_call_function_single(int cpu, smp_call_func_t func, void *info,
WARN_ON_ONCE(cpu_online(this_cpu) && irqs_disabled()
&& !oops_in_progress);

+ if (cpu != smp_processor_id()
+ && ((unsigned)cpu >= nr_cpu_ids || !cpu_online(cpu))) {
+ err = -ENXIO;
+ goto out;
+ }
+
csd = &csd_stack;
if (!wait) {
csd = this_cpu_ptr(&csd_data);
@@ -300,6 +303,7 @@ int smp_call_function_single(int cpu, smp_call_func_t func, void *info,
if (wait)
csd_lock_wait(csd);

+out:
put_cpu();

return err;
@@ -328,6 +332,12 @@ int smp_call_function_single_async(int cpu, struct call_single_data *csd)

preempt_disable();

+ if (cpu != smp_processor_id()
+ && ((unsigned)cpu >= nr_cpu_ids || !cpu_online(cpu))) {
+ err = -ENXIO;
+ goto out;
+ }
+
/* We could deadlock if we have to wait here with interrupts disabled! */
if (WARN_ON_ONCE(csd->flags & CSD_FLAG_LOCK))
csd_lock_wait(csd);
@@ -336,8 +346,9 @@ int smp_call_function_single_async(int cpu, struct call_single_data *csd)
smp_wmb();

err = generic_exec_single(cpu, csd, csd->func, csd->info);
- preempt_enable();

+out:
+ preempt_enable();
return err;
}
EXPORT_SYMBOL_GPL(smp_call_function_single_async);
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/