Re: [RFC PATCH] workqueue: handle CPU hotplug without updating worker pool's attrs

From: Lai Jiangshan
Date: Thu Dec 10 2020 - 21:31:30 EST

Next message: Joonsoo Kim: "Re: [PATCH v3] mm/page_owner: Record timestamp and pid"
Previous message: Shaokun Zhang: "Re: [PATCH v7] lib: optimize cpumask_local_spread()"
Next in thread: Lai Jiangshan: "Re: [RFC PATCH] workqueue: handle CPU hotplug without updating worker pool's attrs"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Wed, Dec 9, 2020 at 4:34 PM Hillf Danton <hdanton@xxxxxxxx> wrote:
>
> This makes the cpumask intact for worker pools of unbound workqueues
> when CPUs go offline because we need to rebind workers to the original
> cpumask(of the original pool) when they come back, as represented by
> the cpu check in restore_unbound_workers_cpumask().
>
> Note this is now only for comments.
>
> Signed-off-by: Hillf Danton <hdanton@xxxxxxxx>
> ---
>
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -4117,6 +4117,32 @@ static void wq_update_unbound_numa(struc
> copy_workqueue_attrs(target_attrs, wq->unbound_attrs);
> pwq = unbound_pwq_by_node(wq, node);
>
> + cpumask = pwq->pool->attrs->cpumask;
> + if (!online || !cpumask_test_cpu(cpu, cpumask))
> + return;
> + do {
> + struct worker *worker;
> + int i;
> +
> + for_each_cpu_and(i, cpu_online_mask, cpumask) {
> + if (i != cpu)
> + return;
> + }
> +
> + /*
> + * rebind workers only when the first CPU in
> + * pool's attrs cpumask comes back because scheduler
> + * may have unbound them when the last CPU went offline
> + */
> + mutex_lock(&wq_pool_attach_mutex);
> +
> + for_each_pool_worker(worker, pwq->pool)
> + set_cpus_allowed_ptr(worker->task, cpumask);
> +

There might be multiple pwqs that share the same pool, this line of
code might update the same pool multiple times.

Please keep the logic in restore_unbound_workers_cpumask().

> + mutex_unlock(&wq_pool_attach_mutex);
> + return;
> + } while (0);
> +
> /*
> * Let's determine what needs to be done. If the target cpumask is
> * different from the default pwq's, we need to compare it to @pwq's
> @@ -5004,34 +5030,6 @@ static void rebind_workers(struct worker
> raw_spin_unlock_irq(&pool->lock);
> }
>
> -/**
> - * restore_unbound_workers_cpumask - restore cpumask of unbound workers
> - * @pool: unbound pool of interest
> - * @cpu: the CPU which is coming up
> - *
> - * An unbound pool may end up with a cpumask which doesn't have any online
> - * CPUs. When a worker of such pool get scheduled, the scheduler resets
> - * its cpus_allowed. If @cpu is in @pool's cpumask which didn't have any
> - * online CPU before, cpus_allowed of all its workers should be restored.
> - */
> -static void restore_unbound_workers_cpumask(struct worker_pool *pool, int cpu)
> -{
> - static cpumask_t cpumask;
> - struct worker *worker;
> -
> - lockdep_assert_held(&wq_pool_attach_mutex);
> -
> - /* is @cpu allowed for @pool? */
> - if (!cpumask_test_cpu(cpu, pool->attrs->cpumask))
> - return;
> -
> - cpumask_and(&cpumask, pool->attrs->cpumask, cpu_online_mask);

Good catch of the problem.
But please fix it where the problem is found (here!)
Like this:

+ /* only restore the cpumask of the pool's workers when @cpu is
+ the first online cpu in @pool's cpumask */
+ if (cpumask_weight(cpumask) > 1)
+ return;

/* as we're called from CPU_ONLINE, the following shouldn't fail */
for_each_pool_worker(worker, pool)
- WARN_ON_ONCE(set_cpus_allowed_ptr(worker->task, &cpumask) < 0);
+ WARN_ON_ONCE(set_cpus_allowed_ptr(worker->task,
pool->attrs->cpumask) < 0);

> -
> - /* as we're called from CPU_ONLINE, the following shouldn't fail */
> - for_each_pool_worker(worker, pool)
> - WARN_ON_ONCE(set_cpus_allowed_ptr(worker->task, &cpumask) < 0);
> -}
> -
> int workqueue_prepare_cpu(unsigned int cpu)
> {
> struct worker_pool *pool;
> @@ -5058,8 +5056,6 @@ int workqueue_online_cpu(unsigned int cp
>
> if (pool->cpu == cpu)
> rebind_workers(pool);
> - else if (pool->cpu < 0)
> - restore_unbound_workers_cpumask(pool, cpu);
>
> mutex_unlock(&wq_pool_attach_mutex);
> }

Next message: Joonsoo Kim: "Re: [PATCH v3] mm/page_owner: Record timestamp and pid"
Previous message: Shaokun Zhang: "Re: [PATCH v7] lib: optimize cpumask_local_spread()"
Next in thread: Lai Jiangshan: "Re: [RFC PATCH] workqueue: handle CPU hotplug without updating worker pool's attrs"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]