Re: qemu:arm test failure due to commit 8053871d0f7f (smp: Fix smp_call_function_single_async() locking)

From: Guenter Roeck
Date: Sat Apr 18 2015 - 21:56:54 EST


On 04/18/2015 05:04 PM, Linus Torvalds wrote:
On Sat, Apr 18, 2015 at 7:40 PM, Guenter Roeck <linux@xxxxxxxxxxxx> wrote:
On Sat, Apr 18, 2015 at 04:23:25PM -0700, Guenter Roeck wrote:

my qemu test for arm:vexpress fails with the latest upstream kernel. It fails
hard - I don't get any output from the console. Bisect points to commit
8053871d0f7f ("smp: Fix smp_call_function_single_async() locking").
Reverting this commit fixes the problem.

Hmm. It being qemu, can you look at where it seems to lock?


static void csd_lock_wait(struct call_single_data *csd)
{
+#if 0
while (smp_load_acquire(&csd->flags) & CSD_FLAG_LOCK)
cpu_relax();
+#else
+ pr_info("csd_lock_wait: flags=0x%x\n", smp_load_acquire(&csd->flags));
+#endif
}

prints

csd_lock_wait: flags=0x3

repeatedly for each call to csd_lock_wait() [and bypasses the problem].
Further debugging shows that wait==1, and that csd points to the
pre-initialized csd_stack (which has CSD_FLAG_LOCK set).

It seems that CSD_FLAG_LOCK is never reset (there is no call to csd_unlock(), ever).

Further debugging (with added WARN_ON if cpu != 0 in smp_call_function_single) shows:

[<800157ec>] (unwind_backtrace) from [<8001250c>] (show_stack+0x10/0x14)
[<8001250c>] (show_stack) from [<80494cb4>] (dump_stack+0x88/0x98)
[<80494cb4>] (dump_stack) from [<80024058>] (warn_slowpath_common+0x84/0xb4)
[<80024058>] (warn_slowpath_common) from [<80024124>] (warn_slowpath_null+0x1c/0x24)
[<80024124>] (warn_slowpath_null) from [<80078fc8>] (smp_call_function_single+0x170/0x178)
[<80078fc8>] (smp_call_function_single) from [<80090024>] (perf_event_exit_cpu+0x80/0xf0)
[<80090024>] (perf_event_exit_cpu) from [<80090110>] (perf_cpu_notify+0x30/0x48)
[<80090110>] (perf_cpu_notify) from [<8003d340>] (notifier_call_chain+0x44/0x84)
[<8003d340>] (notifier_call_chain) from [<8002451c>] (_cpu_up+0x120/0x168)
[<8002451c>] (_cpu_up) from [<800245d4>] (cpu_up+0x70/0x94)
[<800245d4>] (cpu_up) from [<80624234>] (smp_init+0xac/0xb0)
[<80624234>] (smp_init) from [<80618d84>] (kernel_init_freeable+0x118/0x268)
[<80618d84>] (kernel_init_freeable) from [<8049107c>] (kernel_init+0x8/0xe8)
[<8049107c>] (kernel_init) from [<8000f320>] (ret_from_fork+0x14/0x34)
---[ end trace 2f9f1bb8a47b3a1b ]---
smp_call_function_single, cpu=1, wait=1, csd_stack=87825ea0
generic_exec_single, cpu=1, smp_processor_id()=0
csd_lock_wait: csd=87825ea0, flags=0x3

This is repeated for each secondary CPU. But the secondary CPUs don't respond because
they are not enabled, which I guess explains why the lock is never released.

So, in other words, this happens because the system believes (presumably per configuration
/ fdt data) that there are four CPU cores, but that is not really the case. Previously that
did not matter, and was handled correctly. Now it is fatal.

Does this help ?

Guenter

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/