Re: [PATCH v3 2/6] usb: dwc3: gadget: cancel requests instead of release after missed isoc

From: Michael Grzeschik
Date: Tue Oct 18 2022 - 15:13:41 EST


Hi Thinh,

On Tue, Oct 18, 2022 at 06:45:40PM +0000, Thinh Nguyen wrote:
On Mon, Oct 17, 2022, Dan Vacura wrote:
On Mon, Oct 17, 2022 at 09:30:38PM +0000, Thinh Nguyen wrote:
> On Mon, Oct 17, 2022, Dan Vacura wrote:
> > From: Jeff Vanhoof <qjv001@xxxxxxxxxxxx>
> >
> > arm-smmu related crashes seen after a Missed ISOC interrupt when
> > no_interrupt=1 is used. This can happen if the hardware is still using
> > the data associated with a TRB after the usb_request's ->complete call
> > has been made. Instead of immediately releasing a request when a Missed
> > ISOC interrupt has occurred, this change will add logic to cancel the
> > request instead where it will eventually be released when the
> > END_TRANSFER command has completed. This logic is similar to some of the
> > cleanup done in dwc3_gadget_ep_dequeue.
>
> This doesn't sound right. How did you determine that the hardware is
> still using the data associated with the TRB? Did you check the TRB's
> HWO bit?

The problem we're seeing was mentioned in the summary of this patch
series, issue #1. Basically, with the following patch
https://urldefense.com/v3/__https://patchwork.kernel.org/project/linux-usb/patch/20210628155311.16762-6-m.grzeschik@xxxxxxxxxxxxxx/__;!!A4F2R9G_pg!aSNZ-IjMcPgL47A4NR5qp9qhVlP91UGTuCxej5NRTv8-FmTrMkKK7CjNToQQVEgtpqbKzLU2HXET9O226AEN$
integrated a smmu panic is occurring on our Android device with the 5.15
kernel which is:

<3>[ 718.314900][ T803] arm-smmu 15000000.apps-smmu: Unhandled arm-smmu context fault from a600000.dwc3!

The uvc gadget driver appears to be the first (and only) gadget that
uses the no_interrupt=1 logic, so this seems to be a new condition for
the dwc3 driver. In our configuration, we have up to 64 requests and the
no_interrupt=1 for up to 15 requests. The list size of dep->started_list
would get up to that amount when looping through to cleanup the
completed requests. From testing and debugging the smmu panic occurs
when a -EXDEV status shows up and right after
dwc3_gadget_ep_cleanup_completed_request() was visited. The conclusion
we had was the requests were getting returned to the gadget too early.

As I mentioned, if the status is updated to missed isoc, that means that
the controller returned ownership of the TRB to the driver. At least for
the particular request with -EXDEV, its TRBs are completed. I'm not
clear on your conclusion.

Do we know where did the crash occur? Is it from dwc3 driver or from uvc
driver, and at what line? It'd great if we can see the driver log.


>
> The dwc3 driver would only give back the requests if the TRBs of the
> associated requests are completed or when the device is disconnected.
> If the TRB indicated missed isoc, that means that the TRB is completed
> and its status was updated.

Interesting, the device is not disconnected as we don't get the
-ESHUTDOWN status back and with this patch in place things continue
after a -EXDEV status is received.


Actually, minor correction here: a recent change
b44c0e7fef51 ("usb: dwc3: gadget: conditionally remove requests")
changed -ESHUTDOWN request status to -ECONNRESET when disable endpoint.
This doesn't look right.

While disabling endpoint may also apply for other cases such as
switching alternate interface in addition to disconnect, -ESHUTDOWN
seems more fitting there.

Hi Michael,

Can you help clarify for the change above? This changed the usage of
requests. Now requests returned by disconnection won't be returned as
-ESHUTDOWN.

When writing the patch, I was looking into
Documentation/driver-api/usb/error-codes.rst.

After looking into it today, I see that ESHUTDOWN should be send on
ep_disable (device disable) and ECONNRESET on stop_active_transfer.
So I probably just mixed them up, while writing the patch. :/

The followup patch would then just be to swap the status results of
__dwc3_gadget_ep_disable and dwc3_stop_active_transfers on the
dwc3_remove_requests call.

Michael

>
> There's a special case which dwc3 may give back requests early is the
> case of the device disconnecting. The requests should be returned with
> -ESHUTDOWN, and the gadget driver shouldn't be re-using the requests on
> de-initialization anyway.
>
> We should not issue End Transfer command just because of missed isoc. We
> may want issue End Transfer if the gadget driver is too slow and unable
> to feed requests in time (causing underrun and missed isoc) to resync
> with the host, but we already handle that.

Hmm, isn't that what happens when we get into this
condition in dwc3_gadget_endpoint_trbs_complete():

if (usb_endpoint_xfer_isoc(dep->endpoint.desc) &&
list_empty(&dep->started_list) &&
(list_empty(&dep->pending_list) || status == -EXDEV))
dwc3_stop_active_transfer(dep, true, true);


Yes, it's being handled there.

>
> I'm still not clear what's the problem you're seeing. Do you have the
> crash log? Tracepoints?
>

Appreciate the support!


Thanks,
Thinh

--
Pengutronix e.K. | |
Steuerwalder Str. 21 | http://www.pengutronix.de/ |
31137 Hildesheim, Germany | Phone: +49-5121-206917-0 |
Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |

Attachment: signature.asc
Description: PGP signature