Re: [RFC PATCH v2 1/2] workqueue: Unbind workers before sending them to exit()

From: Lai Jiangshan
Date: Wed Jul 27 2022 - 14:13:19 EST


Quick review before going to sleep.

On Wed, Jul 27, 2022 at 7:54 PM Valentin Schneider <vschneid@xxxxxxxxxx> wrote:
>
> It has been reported that isolated CPUs can suffer from interference due to
> per-CPU kworkers waking up just to die.
>
> A surge of workqueue activity during initial setup of a latency-sensitive
> application (refresh_vm_stats() being one of the culprits) can cause extra
> per-CPU kworkers to be spawned. Then, said latency-sensitive task can be
> running merrily on an isolated CPU only to be interrupted sometime later by
> a kworker marked for death (cf. IDLE_WORKER_TIMEOUT, 5 minutes after last
> kworker activity).
>
> Prevent this by affining kworkers to the wq_unbound_cpumask (which doesn't
> contain isolated CPUs, cf. HK_TYPE_WQ) before waking them up after marking
> them with WORKER_DIE.
>
> Changing the affinity does require a sleepable context, so get rid of the
> pool->idle_timer and use a delayed_work instead.
>
> Signed-off-by: Valentin Schneider <vschneid@xxxxxxxxxx>
> ---
> kernel/workqueue.c | 109 +++++++++++++++++++++++++++++++++------------
> 1 file changed, 81 insertions(+), 28 deletions(-)
>
> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> index 1ea50f6be843..27642166dcc5 100644
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -167,9 +167,9 @@ struct worker_pool {
> int nr_workers; /* L: total number of workers */
> int nr_idle; /* L: currently idle workers */
>
> - struct list_head idle_list; /* L: list of idle workers */
> - struct timer_list idle_timer; /* L: worker idle timeout */
> - struct timer_list mayday_timer; /* L: SOS timer for workers */
> + struct list_head idle_list; /* L: list of idle workers */
> + struct delayed_work idle_reaper_work; /* L: worker idle timeout */
> + struct timer_list mayday_timer; /* L: SOS timer for workers */
>
> /* a workers is either on busy_hash or idle_list, or the manager */
> DECLARE_HASHTABLE(busy_hash, BUSY_WORKER_HASH_ORDER);
> @@ -1806,8 +1806,10 @@ static void worker_enter_idle(struct worker *worker)
> /* idle_list is LIFO */
> list_add(&worker->entry, &pool->idle_list);
>
> - if (too_many_workers(pool) && !timer_pending(&pool->idle_timer))
> - mod_timer(&pool->idle_timer, jiffies + IDLE_WORKER_TIMEOUT);
> + if (too_many_workers(pool) && !delayed_work_pending(&pool->idle_reaper_work))
> + mod_delayed_work(system_unbound_wq,
> + &pool->idle_reaper_work,
> + IDLE_WORKER_TIMEOUT);

system_unbound_wq doesn't have a rescuer.

A new workqueue with a rescuer needs to be created and used for
this purpose.

>
> /* Sanity check nr_running. */
> WARN_ON_ONCE(pool->nr_workers == pool->nr_idle && pool->nr_running);
> @@ -1972,9 +1974,29 @@ static struct worker *create_worker(struct worker_pool *pool)
> return NULL;
> }
>
> +static void unbind_worker(struct worker *worker)
> +{
> + kthread_set_per_cpu(worker->task, -1);
> + WARN_ON_ONCE(set_cpus_allowed_ptr(worker->task, wq_unbound_cpumask) < 0);
> +}
> +
> +static void rebind_worker(struct worker *worker, struct worker_pool *pool)
> +{
> + kthread_set_per_cpu(worker->task, pool->cpu);
> + WARN_ON_ONCE(set_cpus_allowed_ptr(worker->task, pool->attrs->cpumask) < 0);
> +}
> +
> +static void reap_worker(struct worker *worker)
> +{
> + list_del_init(&worker->entry);
> + unbind_worker(worker);
> + wake_up_process(worker->task);


Since WORKER_DIE is set, the worker can be possible freed now
if there is another source to wake it up.

I think reverting a part of the commit 60f5a4bcf852("workqueue:
async worker destruction") to make use of kthread_stop()
in destroy_worker() should be a good idea.