RE: [PATCH] Fix the race between smp_call_function and CPU booting

From: Peter Zijlstra
Date: Fri Mar 23 2012 - 08:02:24 EST


On Fri, 2012-03-23 at 11:32 +0000, Liu, Chuansheng wrote:
> In fact, I started two scripts running:
> 1/ One script:
> echo 0 > /sys/devices/system/cpuX/online
> echo 1 > /sys/devices/system/cpuX/online
> Rerunning the above commands in loop
>
> 2/Another script:
> echo 1 > /debug/smp_call_test
> usleep 50000
> Rerunning the above command in loop
>
> This race issue can be easy to be reproduced in several minutes;
> For simplify your test as mine(just two CPUs), you can set other non-booting CPUs as offline
> at first and just leave one non-booting CPU.

So this is exactly what I did and it ran for 30+ minutes without fail. I
found I forgot to log the serial output so I just re-ran this to make
sure. 10+ minutes and not a single WARN in the console output.

If I pop my change to select_fallback_rq() I can indeed trigger this:

------------[ cut here ]------------
WARNING: at /usr/src/linux-2.6/arch/x86/kernel/smp.c:120
native_smp_send_reschedule+0x5b/0x60()
Hardware name: X8DTN
Modules linked in: [last unloaded: scsi_wait_scan]
Pid: 1542, comm: abrtd Not tainted 3.3.0-01725-gd6eb054-dirty #63
Call Trace:
<IRQ> [<ffffffff810775df>] warn_slowpath_common+0x7f/0xc0
[<ffffffff8107763a>] warn_slowpath_null+0x1a/0x20
[<ffffffff8105f79b>] native_smp_send_reschedule+0x5b/0x60
[<ffffffff810aa67a>] try_to_wake_up+0x1fa/0x2c0
[<ffffffff810acaec>] ? sched_slice.isra.38+0x5c/0x90
[<ffffffff810aa795>] wake_up_process+0x15/0x20
[<ffffffff81085c6e>] process_timeout+0xe/0x10
[<ffffffff81086cb3>] run_timer_softirq+0x143/0x460
[<ffffffff81384a94>] ? timerqueue_add+0x74/0xc0
[<ffffffff81085c60>] ? usleep_range+0x50/0x50
[<ffffffff8107e81d>] __do_softirq+0xbd/0x290
[<ffffffff810c5e64>] ? clockevents_program_event+0x74/0x100
[<ffffffff810c72d4>] ? tick_program_event+0x24/0x30
[<ffffffff8194ba4c>] call_softirq+0x1c/0x30
[<ffffffff810433d5>] do_softirq+0x55/0x90
[<ffffffff8107ed2e>] irq_exit+0x9e/0xe0
[<ffffffff8194c07e>] smp_apic_timer_interrupt+0x6e/0x99
[<ffffffff8194b107>] apic_timer_interrupt+0x67/0x70
<EOI>
---[ end trace d2b2cbf78c1ddd2e ]---


But let me re-run with the select_fallback_rq() change and let it run
for several hours while I go play outside..
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/