Re: Converting dev->mutex into dev->spinlock ?

From: Tetsuo Handa
Date: Sat Feb 04 2023 - 12:10:12 EST


On 2023/02/05 1:27, Alan Stern wrote:
> On Sun, Feb 05, 2023 at 01:12:12AM +0900, Tetsuo Handa wrote:
>> On 2023/02/05 0:34, Alan Stern wrote:
>>>> A few of examples:
>>>>
>>>> https://syzkaller.appspot.com/bug?extid=2d6ac90723742279e101
>>>
>>> It's hard to figure out what's wrong from looking at the syzbot report.
>>> What makes you think it is connected with dev->mutex?
>>>
>>> At first glance, it seems that the ath6kl driver is trying to flush a
>>> workqueue while holding a lock or mutex that is needed by one of the
>>> jobs in the workqueue. That's obviously never going to work, no matter
>>> what sort of lockdep validation gets used.
>>
>> That lock is exactly dev->mutex where lockdep validation is disabled.
>> If lockdep validation on dev->mutex were not disabled, we can catch
>> possibility of deadlock before khungtaskd reports real deadlock as hung.
>>
>> Lockdep validation on dev->mutex being disabled is really annoying, and
>> I want to make lockdep validation on dev->mutex enabled; that is the
>> "drivers/core: Remove lockdep_set_novalidate_class() usage" patch.
>
>> Even if it is always safe to acquire a child device's lock while holding
>> the parent's lock, disabling lockdep checks completely on device's lock is
>> not safe.
>
> I understand the problem you want to solve, and I understand that it
> can be frustrating. However, I do not believe you will be able to
> solve this problem.

That is a declaration that driver developers are allowed to take it for granted
that driver callback functions can behave as if dev->mutex is not held.

Some developers test their changes with lockdep enabled, and believe that their
changes are correct because lockdep did not complain.
https://syzkaller.appspot.com/bug?extid=9ef743bba3a17c756174 is an example.

We should somehow update driver core code to make it possible to keep lockdep
checks enabled on dev->mutex.