Re: [PATCH v3 (repost)] workqueue: Warn flushing of kernel-global workqueues

From: Linus Torvalds
Date: Sat Mar 19 2022 - 13:16:25 EST


On Fri, Mar 18, 2022 at 9:43 PM Tetsuo Handa
<penguin-kernel@xxxxxxxxxxxxxxxxxxx> wrote:
>
> Since flush operation synchronously waits for completion, flushing
> kernel-global WQs (e.g. system_wq) might introduce possibility of deadlock
> due to unexpected locking dependency. Tejun Heo commented that it makes no
> sense at all to call flush_workqueue() on the shared WQs as the caller has
> no idea what it's gonna end up waiting for.

NAK on this patch for a very simple reason:

static inline void flush_scheduled_work(void)
{
flush_workqueue(system_wq);
}

and now grep for flush_scheduled_work().

The *other* system workqueue flushes may be rare and "that subsystem
should just be converted to do its own workqueue".

But flush_scheduled_work() is literally exported as an explicit and
separate interface,

The fact that the function has a big warning in the comment above it
doesn't change that fact. At the very least, this patch has to be
preceded by a couple of other patches that fix a couple of subsystems
and document "this is what you should do".

Because suddenly saying "hey, we gave you this interface, now we're
warning about it because it's going to go away" without actually
showing how to do it instead is not acceptable.

And honestly, I don't personally see a good conversion. We literally
have all those "schedule_{delayed_}work{_on}()" etc helper functions
that are *designed* to use this system_wq. People *need* the ability
to flush those things, even if it's only for module unload.

So I really think this patch on its own is completely broken. It'd not
pointing to a way forward, it's just saying "don't do this" with no
really acceptable way to not do it.

Removing flush_scheduled_work() needs to be paired with removing the
"schedule_{delayed_}work{_on}()" helpers too.

And I don't see you having a good alternative.

So until that clear way forward exists, NAK.

Linus