Re: [BUG] staging: wfx: possible deadlock in wfx_conf_tx() and wfx_add_interface()

From: Jia-Ju Bai
Date: Sat Feb 05 2022 - 03:35:22 EST




On 2022/2/1 19:33, Hillf Danton wrote:
On Tue, 1 Feb 2022 15:09:34 +0800 Jia-Ju Bai wrote:
Hello,

My static analysis tool reports a possible deadlock in the wfx driver in
Linux 5.16:

wfx_conf_tx()
  mutex_lock(&wdev->conf_mutex); --> Line 225 (Lock A)
  wfx_update_pm()
    wait_for_completion_timeout(&wvif->set_pm_mode_complete, ...); -->
Line 3019 (Wait X)

wfx_add_interface()
  mutex_lock(&wdev->conf_mutex); --> Line 737 (Lock A)
  complete(&wvif->set_pm_mode_complete); --> Line 758 (Wake X)

When wfx_conf_tx() is executed, "Wait X" is performed by holding "Lock
A". If wfx_add_interface() is executed at this time, "Wake X" cannot be
performed to wake up "Wait X" in wfx_conf_tx(), because "Lock A" has
been already hold by wfx_conf_tx(), causing a possible deadlock.
I find that "Wait X" is performed with a timeout, to relieve the
possible deadlock; but I think this timeout can cause inefficient execution.

I am not quite sure whether this possible problem is real and how to fix
it if it is real.
Any feedback would be appreciated, thanks :)


Best wishes,
Jia-Ju Bai
Hey Jia-Ju

Thank you for reporting it.

Given the init_completion() prior to complete() in wfx_add_interface(),
no waiter is waken up by the complete(), so it has nothing to do with
the waiter in the conf path.

Hi Hillf,

Thanks for your reply and detailed explanation :)


BTW if the unusual wfx init is a real use case then we can add a new helper.

Mind introducing your toy to LKML? How much time have you been put in it?
Its current status and future works?

Do you mean my static analysis tool that generated the report?
In fact, I spent 3-4 months of my part time on implementing this tool, which can detect deadlocks caused by locking cycles and improper waiting/waking operations.
This tool still reports some false positives, due to missing some special patterns in the kernel code, such as "init_completion() prior to complete()" in this false bug.
Thus, I am improving the tool to reduce false positives now.
Any suggestion on deadlock detection or the tool would be appreciated, thanks :)


Best wishes,
Jia-Ju Bai