Re: [RFC][PATCH] sched: Start stopper early

From: Heiko Carstens
Date: Fri Oct 16 2015 - 04:22:29 EST


On Wed, Oct 07, 2015 at 10:41:10AM +0200, Peter Zijlstra wrote:
> Hi,
>
> So Heiko reported some 'interesting' fail where stop_two_cpus() got
> stuck in multi_cpu_stop() with one cpu waiting for another that never
> happens.
>
> It _looks_ like the 'other' cpu isn't running and the current best
> theory is that we race on cpu-up and get the stop_two_cpus() call in
> before the stopper task is running.
>
> This _is_ possible because we set 'online && active' _before_ we do the
> smpboot_unpark thing because of ONLINE notifier order.
>
> The below test patch manually starts the stopper task early.
>
> It boots and hotplugs a cpu on my test box so its not insta broken.
>
> ---
> kernel/sched/core.c | 7 ++++++-
> kernel/stop_machine.c | 5 +++++
> 2 files changed, 11 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 1764a0f..9a56ef7 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -5542,14 +5542,19 @@ static void set_cpu_rq_start_time(void)
> rq->age_stamp = sched_clock_cpu(cpu);
> }
>
> +extern void cpu_stopper_unpark(unsigned int cpu);
> +
> static int sched_cpu_active(struct notifier_block *nfb,
> unsigned long action, void *hcpu)
> {
> + int cpu = (long)hcpu;
> +
> switch (action & ~CPU_TASKS_FROZEN) {
> case CPU_STARTING:
> set_cpu_rq_start_time();
> return NOTIFY_OK;
> case CPU_ONLINE:
> + cpu_stopper_unpark(cpu);
> /*
> * At this point a starting CPU has marked itself as online via
> * set_cpu_online(). But it might not yet have marked itself
> @@ -5558,7 +5563,7 @@ static int sched_cpu_active(struct notifier_block *nfb,
> * Thus, fall-through and help the starting CPU along.
> */
> case CPU_DOWN_FAILED:
> - set_cpu_active((long)hcpu, true);
> + set_cpu_active(cpu, true);
> return NOTIFY_OK;
> default:
> return NOTIFY_DONE;
> diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
> index 12484e5..c674371 100644
> --- a/kernel/stop_machine.c
> +++ b/kernel/stop_machine.c
> @@ -496,6 +496,11 @@ static struct smp_hotplug_thread cpu_stop_threads = {
> .selfparking = true,
> };
>
> +void cpu_stopper_unpark(unsigned int cpu)
> +{
> + kthread_unpark(per_cpu(cpu_stopper.thread, cpu));
> +}
> +

So, actually this doesn't fix the bug and it _seems_ to be reproducible.

[ FWIW, I will be offline for the next two weeks ]

The bug was reproduced with your patch applied to 4.2.0 (+ couple of
unrelated internal patches).

In addition I cherry-picked these two upstream commits:
dd9d3843755d "sched: Fix cpu_active_mask/cpu_online_mask race"
02cb7aa923ec "stop_machine: Move 'cpu_stopper_task' and
'stop_cpus_work' into 'struct cpu_stopper'"

The new dump again shows one cpu looping in multi_cpu_stop() triggered by
stop_two_cpus(), and the second one will never enter multi_cpu_stop() since
the corresponding cpu_stop_work was never enqueued:

The two cpu_stop_work on the stack of the process that invocated
stop_two_cpus() look like this:

crash> struct cpu_stop_work 0x8ad8afa78
struct cpu_stop_work {
list = {
next = 0x8ad8afa78,
prev = 0x8ad8afa78
},
fn = 0x2091b0 <multi_cpu_stop>,
arg = 0x8ad8afac8,
done = 0x8ad8afaf0
}

crash> struct cpu_stop_work 0x8ad8afaa0
struct cpu_stop_work {
list = {
next = 0x0, <---- NULL indicates it was never enqueued
prev = 0x0
},
fn = 0x2091b0 <multi_cpu_stop>,
arg = 0x8ad8afac8,
done = 0x8ad8afaf0
}

The corresponding struct cpu_stop_done below indicates that at least for
one of them cpu_stop_signal_done() was called (nr_todo == 1). So the idea
is still that this happened when cpu_stop_queue_work() was being called,
but the corresponding stopper was not enabled.

crash> struct -x cpu_stop_done 00000008ad8afaf0
struct cpu_stop_done {
nr_todo = {
counter = 0x1
},
executed = 0x0,
ret = 0x0,
completion = {
done = 0x0,
wait = {
lock = {
{
rlock = {
raw_lock = {
lock = 0x0
},
break_lock = 0x0,
magic = 0xdead4ead,
owner_cpu = 0xffffffff,
owner = 0xffffffffffffffff,
dep_map = {
key = 0x1e901e0 <__key.5629>,
class_cache = {0x188fec0 <lock_classes+298096>, 0x0},
name = 0xb40d0c "&x->wait",
cpu = 0xb,
ip = 0x94e5b2
}
},
{
__padding = "\000\000\000\000\000\000\000\000 ÞN\255\377\377\377\377\377\377\377\377\377\377\377\377",
dep_map = {
key = 0x1e901e0 <__key.5629>,
class_cache = {0x188fec0 <lock_classes+298096>, 0x0},
name = 0xb40d0c "&x->wait",
cpu = 0xb,
ip = 0x94e5b2
}
}
}
},
task_list = {
next = 0x8ad8afa20,
prev = 0x8ad8afa20
}
}
}
}

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/