Re: [PATCH v6 4/4] workqueue: Unbind kworkers before sending them to exit()

From: Valentin Schneider
Date: Thu Dec 01 2022 - 05:38:50 EST


On 01/12/22 11:01, Lai Jiangshan wrote:
> On Tue, Nov 29, 2022 at 2:31 AM Valentin Schneider <vschneid@xxxxxxxxxx> wrote:
>
>> @@ -3627,8 +3668,11 @@ static bool wq_manager_inactive(struct worker_pool *pool)
>> static void put_unbound_pool(struct worker_pool *pool)
>> {
>> DECLARE_COMPLETION_ONSTACK(detach_completion);
>> + struct list_head cull_list;
>> struct worker *worker;
>>
>> + INIT_LIST_HEAD(&cull_list);
>> +
>> lockdep_assert_held(&wq_pool_mutex);
>>
>> if (--pool->refcnt)
>> @@ -3651,17 +3695,19 @@ static void put_unbound_pool(struct worker_pool *pool)
>> * Because of how wq_manager_inactive() works, we will hold the
>> * spinlock after a successful wait.
>> */
>> + mutex_lock(&wq_pool_attach_mutex);
>> rcuwait_wait_event(&manager_wait, wq_manager_inactive(pool),
>> TASK_UNINTERRUPTIBLE);
>> pool->flags |= POOL_MANAGER_ACTIVE;
>
> Hello, Valentin
>
> I'm afraid it might deadlock here.
>
> If put_unbound_pool() is called while manage_workers() is sleeping
> on allocating memory, put_unbound_pool() will get the wq_pool_attach_mutex
> earlier than the manager which prevents the manager from getting the
> lock to attach the newly created worker and deadlock.
>

Well spotted, I can see it now.

> I think mutex_lock(&wq_pool_attach_mutex) can be moved into
> wq_manager_inactive(), and handle it in the same way as pool->lock.
>

That looks sane enough, I'll try to tweak my tests to get the manager
involved to test this out. Thanks!