Re: [syzbot] [net?] [nfc?] INFO: task hung in nfc_targets_found

From: Tetsuo Handa
Date: Thu Jan 04 2024 - 05:35:15 EST


On 2024/01/04 14:05, Hillf Danton wrote:
> On Wed, 03 Jan 2024 16:59:25 -0800
>> HEAD commit: 453f5db0619e Merge tag 'trace-v6.7-rc7' of git://git.kerne..
>> git tree: upstream
>> console output: https://syzkaller.appspot.com/x/log.txt?x=141bc48de80000
>> kernel config: https://syzkaller.appspot.com/x/.config?x=f8e72bae38c079e4
>> dashboard link: https://syzkaller.appspot.com/bug?extid=2b131f51bb4af224ab40
>> compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
>>
>
> syz-executor.1:27827 kworker/u4:93/7607 kworker/0:1/11541
> === === ===
> nci_close_device() nci_rx_work() nfc_urelease_event_work()
> mutex_lock(&ndev->req_lock) device_lock()
> flush_workqueue(ndev->rx_wq) mutex_lock(&ndev->req_lock)
> device_lock()
>
> Looks like lockdep failed to detect deadlock once more because of device_lock().

Yes, this is yet another circular locking dependency hidden by device_lock().

Calling flush_workqueue(ndev->rx_wq) with ndev->req_lock has to be avoided,
for nci_close_device() has ndev->req_lock => dev->dev dependency and
nfc_urelease_event_work() has dev->dev => ndev->req_lock dependency.

nci_close_device() {
mutex_lock(&ndev->req_lock); // ffff88802bed4350
flush_workqueue(ndev->rx_wq); // wait for nci_rx_work() to complete
mutex_unlock(&ndev->req_lock); // ffff88802bed4350
}

nci_rx_work() { // ndev->rx_work is on ndev->rx_wq
nci_ntf_packet() {
device_lock(&dev->dev); // ffff88802bed5100
device_unlock(&dev->dev); // ffff88802bed5100
}
}

nfc_urelease_event_work() {
mutex_lock(&nfc_devlist_mutex); // ffffffff8ee4d808
mutex_lock(&dev->genl_data.genl_data_mutex); // ffff88802bed5508
nfc_stop_poll() {
device_lock(&dev->dev); // ffff88802bed5100
nci_stop_poll() {
nci_request() {
mutex_lock(&ndev->req_lock); // ffff88802bed4350
mutex_unlock(&ndev->req_lock); // ffff88802bed4350
}
}
device_unlock(&dev->dev); // ffff88802bed5100
}
mutex_unlock(&dev->genl_data.genl_data_mutex); // ffff88802bed5508
mutex_unlock(&nfc_devlist_mutex); // ffffffff8ee4d808
}

I consider that we need to enable lockdep validation on dev->dev mutex
( https://lkml.kernel.org/r/c7fb01a9-3e12-77ed-5c4c-db7deb64dc73@xxxxxxxxxxxxxxxxxxx )
but was some alternative to my proposal at
https://lkml.kernel.org/r/1ad499bb-0c53-7529-ff00-e4328823f6fa@xxxxxxxxxxxxxxxxxxx
proposed? Is it time to retry my proposal?