Re: [PATCH] workqueue: don't skip lockdep wq dependency in cancel_work_sync()

From: Tejun Heo
Date: Thu Jul 28 2022 - 12:40:25 EST


On Thu, Jul 28, 2022 at 09:23:25PM +0900, Tetsuo Handa wrote:
> Like Hillf Danton mentioned
>
> syzbot should have been able to catch cancel_work_sync() in work context
> by checking lockdep_map in __flush_work() for both flush and cancel.
>
> in [1], being unable to report an obvious deadlock scenario shown below is
> broken. From locking dependency perspective, sync version of cancel request
> should behave as if flush request, for it waits for completion of work if
> that work has already started execution.
>
> ----------
> #include <linux/module.h>
> #include <linux/sched.h>
> static DEFINE_MUTEX(mutex);
> static void work_fn(struct work_struct *work)
> {
> schedule_timeout_uninterruptible(HZ / 5);
> mutex_lock(&mutex);
> mutex_unlock(&mutex);
> }
> static DECLARE_WORK(work, work_fn);
> static int __init test_init(void)
> {
> schedule_work(&work);
> schedule_timeout_uninterruptible(HZ / 10);
> mutex_lock(&mutex);
> cancel_work_sync(&work);
> mutex_unlock(&mutex);
> return -EINVAL;
> }
> module_init(test_init);
> MODULE_LICENSE("GPL");
> ----------
>
> Link: https://lkml.kernel.org/r/20220504044800.4966-1-hdanton@xxxxxxxx [1]
> Reported-by: Hillf Danton <hdanton@xxxxxxxx>
> Fixes: d6e89786bed977f3 ("workqueue: skip lockdep wq dependency in cancel_work_sync()")

Tetsuo, you gotta explain why this is okay w.r.t. the spurious warnings that
the above commit addressed. You can't just state that there are cases which
are missed and then revert it.

Thanks.

--
tejun