Re: [PATCH] workqueue: Use private WQ for schedule_on_each_cpu() API

From: Tetsuo Handa
Date: Wed Feb 23 2022 - 17:26:48 EST


On 2022/02/24 6:33, Tejun Heo wrote:
> On Wed, Feb 23, 2022 at 09:57:27AM +0900, Tetsuo Handa wrote:
>> On 2022/02/23 2:29, Tejun Heo wrote:
>>> On Mon, Feb 21, 2022 at 07:38:09PM +0900, Tetsuo Handa wrote:
>>>> Since schedule_on_each_cpu() calls schedule_work_on() and flush_work(),
>>>> we should avoid using system_wq in order to avoid unexpected locking
>>>> dependency.
>>>
>>> I don't get it. schedule_on_each_cpu() is flushing each work item and thus
>>> shouldn't need its own flushing domain. What's this change for?
>>
>> A kernel test robot tested "[PATCH v2] workqueue: Warn flush attempt using
>> system-wide workqueues" on 5.16.0-06523-g29bd199e4e73 and hit a lockdep
>> warning ( https://lkml.kernel.org/r/20220221083358.GC835@xsang-OptiPlex-9020 ).
>>
>> Although the circular locking dependency itself needs to be handled by
>> lockless console printing support, we won't be able to apply
>> "[PATCH v2] workqueue: Warn flush attempt using system-wide workqueues"
>> if schedule_on_each_cpu() continues using system-wide workqueues.
>
> The patch seems pretty wrong. What's problematic is system workqueue flushes
> (which flushes the entire workqueue), not work item flushes.

Why? My understanding is that

flushing a workqueue waits for completion of all work items in that workqueue

flushing a work item waits for for completion of that work item using
a workqueue specified as of queue_work()

and

if a work item in some workqueue is blocked by other work in that workqueue
(e.g. max_active limit, work items on that workqueue and locks they need),
it has a risk of deadlock

. Then, how can flushing a work item using system-wide workqueues be free of deadlock risk?
Isn't it just "unlikely to deadlock" rather than "impossible to deadlock"?