[RFC PATCH v3 0/3] workqueue: destroy_worker() vs isolated CPUs

From: Valentin Schneider
Date: Tue Aug 02 2022 - 04:42:29 EST


Hi folks,

Using a work struct from within the workqueue code itself is a bit scary, but
it seems to be holding up (at the very least on the locking side of things).

Note that this affects all kworkers (not just percpu ones) for the sake of
consistency and to prevent adding extra corner cases. kthread_set_per_cpu(p, -1)
is a no-op for unbound kworkers, and IIUC the affinity change is not required
since unbound workers have to be affined to a subset of wq_unbound_cpumask, but
it shouldn't be harmful either.

3/3 (not for merging!) is a simple and stupid stresser that forces extra pcpu
kworkers to be spawned on a specific CPU - I can then quickly test this on QEMU
by making sure said CPU is isolated on the cmdline.

Thanks to Tejun & Lai for the discussion thus far.

Revisions
=========

RFCv2 -> RFCv3
++++++++++++++

o Rebase onto v5.19
o Add new patch (1/3) around accessing wq_unbound_cpumask

o Prevent WORKER_DIE workers for kfree()'ing themselves before the idle reaper
gets to handle them (Tejun)

Bit of an aside on that: I've been struggling to convince myself this can
happen due to spurious wakeups and would like some help here.

Idle workers are TASK_UNINTERRUPTIBLE, so they can't be woken up by
signals. That state is set *under* pool->lock, and all wakeups (before this
patch) are also done while holding pool->lock.

wake_up_worker() is done under pool->lock AND only wakes a worker on the
pool->idle_list. Thus the to-be-woken worker *cannot* have WORKER_DIE, though
it could gain it *after* being woken but *before* it runs, e.g.:

LOCK pool->lock
wake_up_worker(pool)
wake_up_process(p)
UNLOCK pool->lock
idle_reaper_fn()
LOCK pool->lock
destroy_worker(worker, list);
UNLOCK pool->lock
worker_thread()
goto woke_up;
LOCK pool->lock
READ worker->flags & WORKER_DIE
UNLOCK pool->lock
...
kfree(worker);
reap_worker(worker);
// Uh-oh

... But IMO that's not a spurious wakeup, that's a concurrency issue. I don't
see any spurious/unexpected worker wakeup happening once a worker is off the
pool->idle_list.


RFCv1 -> RFCv2
++++++++++++++

o Change the pool->timer into a delayed_work to have a sleepable context for
unbinding kworkers

Cheers,
Valentin

Valentin Schneider (3):
workqueue: Hold wq_pool_mutex while affining tasks to
wq_unbound_cpumask
workqueue: Unbind workers before sending them to exit()
DEBUG-DO-NOT-MERGE: workqueue: kworker spawner

kernel/Makefile | 2 +-
kernel/workqueue.c | 169 +++++++++++++++++++++++++++++-------
kernel/workqueue_internal.h | 1 +
kernel/wqstress.c | 69 +++++++++++++++
4 files changed, 207 insertions(+), 34 deletions(-)
create mode 100644 kernel/wqstress.c

--
2.31.1