Re: [PATCH v2] stop_machine: Disable preemption when waking two stopper threads

From: Peter Zijlstra
Date: Mon Jul 02 2018 - 08:15:13 EST


On Fri, Jun 29, 2018 at 01:55:12PM -0700, Isaac J. Manjarres wrote:
> When cpu_stop_queue_two_works() begins to wake the stopper
> threads, it does so without preemption disabled, which leads
> to the following race condition:
>
> The source CPU calls cpu_stop_queue_two_works(), with cpu1
> as the source CPU, and cpu2 as the destination CPU. When
> adding the stopper threads to the wake queue used in this
> function, the source CPU stopper thread is added first,
> and the destination CPU stopper thread is added last.
>
> When wake_up_q() is invoked to wake the stopper threads, the
> threads are woken up in the order that they are queued in,
> so the source CPU's stopper thread is woken up first, and
> it preempts the thread running on the source CPU.
>
> The stopper thread will then execute on the source CPU,
> disable preemption, and begin executing multi_cpu_stop(),
> and wait for an ack from the destination CPU's stopper thread,
> with preemption still disabled. Since the worker thread that
> woke up the stopper thread on the source CPU is affine to the
> source CPU, and preemption is disabled on the source CPU, that
> thread will never run to dequeue the destination CPU's stopper
> thread from the wake queue, and thus, the destination CPU's
> stopper thread will never run, causing the source CPU's stopper
> thread to wait forever, and stall.
>
> Disable preemption when waking the stopper threads in
> cpu_stop_queue_two_works() to ensure that the worker thread
> that is waking up the stopper threads isn't preempted
> by the source CPU's stopper thread, and permanently
> scheduled out, leaving the remaining stopper thread asleep
> in the wake queue.
>
> Co-developed-by: Pavankumar Kondeti <pkondeti@xxxxxxxxxxxxxx>
> Signed-off-by: Prasad Sodagudi <psodagud@xxxxxxxxxxxxxx>
> Signed-off-by: Pavankumar Kondeti <pkondeti@xxxxxxxxxxxxxx>
> Signed-off-by: Isaac J. Manjarres <isaacm@xxxxxxxxxxxxxx>

That SoB chain is broken, if Prasad wrote the ptch then there needs to
be a From: line somewhere.

But yes, that looks about right.

> ---
> kernel/stop_machine.c | 6 +++++-
> 1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
> index f89014a..1ff523d 100644
> --- a/kernel/stop_machine.c
> +++ b/kernel/stop_machine.c
> @@ -270,7 +270,11 @@ static int cpu_stop_queue_two_works(int cpu1, struct cpu_stop_work *work1,
> goto retry;
> }
>
> - wake_up_q(&wakeq);
> + if (!err) {
> + preempt_disable();
> + wake_up_q(&wakeq);
> + preempt_enable();
> + }
>
> return err;
> }
> --
> The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
> a Linux Foundation Collaborative Project
>