Re: [PATCH] usb: gadget: uvc_video: unlock before submitting a request to ep

From: Dan Scally
Date: Thu Nov 16 2023 - 04:28:36 EST


CC Thinh - sorry to bother you, just want to make sure we fix this in the right place.

On 08/11/2023 11:48, Kuen-Han Tsai wrote:
On 02/11/2023 07:11, Piyush Mehta wrote:
There could be chances where the usb_ep_queue() could fail and trigger
complete() handler with error status. In this case, if usb_ep_queue()
is called with lock held and the triggered complete() handler is waiting
for the same lock to be cleared could result in a deadlock situation and
could result in system hang. To aviod this scenerio, call usb_ep_queue()
with lock removed. This patch does the same.
I would like to provide more background information on this problem.

We met a deadlock issue on Android devices and the followings are stack traces.

[35845.978435][T18021] Core - Debugging Information for Hardlockup core(8) - locked CPUs mask (0x100)
[35845.978442][T18021] Call trace:
[*][T18021] queued_spin_lock_slowpath+0x84/0x388
[35845.978451][T18021] uvc_video_complete+0x180/0x24c
[35845.978458][T18021] usb_gadget_giveback_request+0x38/0x14c
[35845.978464][T18021] dwc3_gadget_giveback+0xe4/0x218
[35845.978469][T18021] dwc3_gadget_ep_cleanup_cancelled_requests+0xc8/0x108
[35845.978474][T18021] __dwc3_gadget_kick_transfer+0x34c/0x368
[35845.978479][T18021] __dwc3_gadget_start_isoc+0x13c/0x3b8
[35845.978483][T18021] dwc3_gadget_ep_queue+0x150/0x2f0
[35845.978488][T18021] usb_ep_queue+0x58/0x16c
[35845.978493][T18021] uvcg_video_pump+0x22c/0x518


I note in the kerneldoc comment for usb_ep_queue() that calling .complete() from within itself is specifically disallowed [1]:

    Note that @req's ->complete() callback must never be called from

    within usb_ep_queue() as that can create deadlock situations.


And it looks like that's what's happening here - is this something that needs addressing in the dwc3 driver?


Thanks

Dan


[1] https://elixir.bootlin.com/linux/v6.7-rc1/source/drivers/usb/gadget/udc/core.c#L275


As mentioned by Piyush, the uvcg_video_pump function acquires a spinlock before submitting the USB
request to the endpoint, which will be processed by the dwc3 controller in our case.

However, a deadlock can occur when the dwc3 controller fails to kick the transfer and decides to
cancel and clean up all requests. At this point, the dwc3 driver calls the giveback function asking
the corresponding driver to handle the cancellation. The uvcg_queue_cancel function then acquires
the same spinlock to cancel the request, which results in a double acquirement and a deadlock.

Signed-off-by: Piyush Mehta <piyush.mehta@xxxxxxx>
---
drivers/usb/gadget/function/uvc_video.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/usb/gadget/function/uvc_video.c b/drivers/usb/gadget/function/uvc_video.c
index 91af3b1ef0d4..0a5d9ac145e7 100644
--- a/drivers/usb/gadget/function/uvc_video.c
+++ b/drivers/usb/gadget/function/uvc_video.c
@@ -460,11 +460,12 @@ static void uvcg_video_pump(struct work_struct *work)
req->no_interrupt = 1;
}

- /* Queue the USB request */
- ret = uvcg_video_ep_queue(video, req);
spin_unlock_irqrestore(&queue->irqlock, flags);

+ /* Queue the USB request */
+ ret = uvcg_video_ep_queue(video, req);
if (ret < 0) {
+ usb_ep_set_halt(video->ep);
uvcg_queue_cancel(queue, 0);
break;
}