Re: [PATCH] workqueue: Don't double assign worker->sleeping

From: Lai Jiangshan
Date: Tue Mar 31 2020 - 23:23:12 EST


Hello

If I don't miss all the issues you have listed, it is a good and straightforward
fix, but I have concern that cmpxchg_local() might have performance impact
on some non-x86 arch.

The two issues as you have listed:
1) WARN_ON_ONCE() on valid condition when interrupted(async-page-faulted)
2) wq_worker_running() can be interrupted(async-page-faulted in virtual machine)
and nr_running would be decreased twice.

For fixing issue one, we can just remove WARN_ON_ONCE() as this patch.
For fixing issue two, ->sleeping in wq_worker_running() can be checked&modified
under irq-disabled. (we can't use preempt-disabled context here)

thanks,
Lai

On Sat, Mar 28, 2020 at 1:53 AM Sebastian Andrzej Siewior
<bigeasy@xxxxxxxxxxxxx> wrote:
>
> The kernel test robot triggered a warning with the following race:
> task-ctx interrupt-ctx
> worker
> -> process_one_work()
> -> work_item()
> -> schedule();
> -> sched_submit_work()
> -> wq_worker_sleeping()
> -> ->sleeping = 1
> atomic_dec_and_test(nr_running)
> __schedule(); *interrupt*
> async_page_fault()
> -> local_irq_enable();
> -> schedule();
> -> sched_submit_work()
> -> wq_worker_sleeping()
> -> if (WARN_ON(->sleeping)) return
> -> __schedule()
> -> sched_update_worker()
> -> wq_worker_running()
> -> atomic_inc(nr_running);
> -> ->sleeping = 0;
>
> -> sched_update_worker()
> -> wq_worker_running()
> if (!->sleeping) return
>
> In this context the warning is pointless everything is fine.
>
> However, if the interrupt occurs in wq_worker_sleeping() between reading and
> setting `sleeping' i.e.
>
> | if (WARN_ON_ONCE(worker->sleeping))
> | return;
> *interrupt*
> | worker->sleeping = 1;
>
> then pool->nr_running will be decremented twice in wq_worker_sleeping()
> but it will be incremented only once in wq_worker_running().
>
> Replace the assignment of `sleeping' with a cmpxchg_local() to ensure
> that there is no double assignment of the variable. The variable is only
> accessed from the local CPU. Remove the WARN statement because this
> condition can be valid.
>
> An alternative would be to move `->sleeping' to `->flags' as a new bit
> but this would require to acquire the pool->lock in wq_worker_running().
>
> Fixes: 6d25be5782e48 ("sched/core, workqueues: Distangle worker accounting from rq lock")
> Link: 20200327074308.GY11705@shao2-debian">https://lkml.kernel.org/r/20200327074308.GY11705@shao2-debian
> Reported-by: kernel test robot <lkp@xxxxxxxxx>
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@xxxxxxxxxxxxx>
> ---
> kernel/workqueue.c | 6 ++----
> 1 file changed, 2 insertions(+), 4 deletions(-)
>
> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> index 4e01c448b4b48..dc477a2a3ce30 100644
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -846,11 +846,10 @@ void wq_worker_running(struct task_struct *task)
> {
> struct worker *worker = kthread_data(task);
>
> - if (!worker->sleeping)
> + if (cmpxchg_local(&worker->sleeping, 1, 0) == 0)
> return;
> if (!(worker->flags & WORKER_NOT_RUNNING))
> atomic_inc(&worker->pool->nr_running);
> - worker->sleeping = 0;
> }
>
> /**
> @@ -875,10 +874,9 @@ void wq_worker_sleeping(struct task_struct *task)
>
> pool = worker->pool;
>
> - if (WARN_ON_ONCE(worker->sleeping))
> + if (cmpxchg_local(&worker->sleeping, 0, 1) == 1)
> return;
>
> - worker->sleeping = 1;
> spin_lock_irq(&pool->lock);
>
> /*
> --
> 2.26.0
>