Re: [RFC PATCH v2 1/2] workqueue: Unbind workers before sending them to exit()

From: Valentin Schneider
Date: Thu Jul 28 2022 - 13:24:26 EST


On 28/07/22 06:35, Tejun Heo wrote:
> On Thu, Jul 28, 2022 at 11:54:19AM +0100, Valentin Schneider wrote:
>> On 28/07/22 01:13, Lai Jiangshan wrote:
>> > system_unbound_wq doesn't have a rescuer.
>> >
>> > A new workqueue with a rescuer needs to be created and used for
>> > this purpose.
>> >
>>
>> Right, I think it makes sense for those work items to be attached to a
>> WQ_MEM_RECLAIM workqueue. Should I add that as a workqueue-internal
>> thing?
>
> I don't understand why this would need MEM_RECLAIM when it isn't sitting in
> the memory reclaim path. Nothing in mm side can wait on this.
>

Vaguely reading the doc I thought that'd be for anything that would
directly or indirectly help with reclaiming memory (not explicitly sitting
in some *mm reclaim* path), and I assumed freeing up a worker would count as
that - but that's the understanding of someone who doesn't know much about
all that :-)

>> > Since WORKER_DIE is set, the worker can be possible freed now
>> > if there is another source to wake it up.
>> >
>>
>> My understanding for having reap_worker() be "safe" to use outside of
>> raw_spin_lock_irq(pool->lock) is that pool->idle_list is never accessed
>> outside of the pool->lock, and wake_up_worker() only wakes a worker that
>> is in that list. So with destroy_worker() detaching the worker from
>> pool->idle_list under pool->lock, I'm not aware of a codepath other than
>> reap_worker() that could wake it up.
>
> There actually are spurious wakeups. We can't depend on there being no
> wakeups than ours.
>

Myes, I suppose if a to-be-destroyed kworker spuriously wakes before having
been unbound then there's not much point in having the unbinding (harm has
been done and the kworker can do_exit(), though arguably we could reduce
the harm and still move it away), but let me see what I can do here.

> Thanks.
>
> --
> tejun