Re: dwc3: unusual handling of setup requests with wLength == 0

From: Thinh Nguyen
Date: Wed Aug 23 2023 - 18:23:18 EST


On Wed, Aug 23, 2023, Alan Stern wrote:
> On Wed, Aug 23, 2023 at 05:59:07PM +0000, Thinh Nguyen wrote:
> > On Wed, Aug 23, 2023, Alan Stern wrote:
> > > STALL is not a valid status for usb_requests on the gadget side; it
> > > applies only on the host side (the host doesn't halt its endpoints).
> >
> > The host can send a CLEAR_FEATURE(halt_ep). This will reset the data
> > sequence of the endpoint. In xhci spec (section 4.6.8), it suggests to
> > send this when the endpoint is reset. The endpoint is reset typically
> > when there's a transaction error.
>
> It's important to be careful about the distinction between an actual
> endpoint in the gadget and the logical representation of an endpoint
> inside a host controller. The host cannot reset the first; it can only
> reset the second.
>
> So yes, the usb_clear_halt() routine on the host does a
> CLEAR_FEATURE(HALT) control transfer and then calls
> usb_reset_endpoint(), which calls usb_hcd_reset_endpoint().
>
> > The problem here is that typical protocol spec like MSC/UVC don't
> > specify how to handle CLEAR_FEATURE(halt_ep).
>
> MSC does specify this. I don't know about UVC.

No, from what I last recalled, it doesn't clearly define what should
happen here. It just indicates ClearFeature(halt_ep) for reset recovery.
However, the "reset recovery" can be implementation specific for how the
host can synchronize with the device.

>
> > For Windows MSC driver, when the host recovers from the transaction
> > error, it sends CLEAR_FEATURE(halt_ep) and expects the transfer to be
> > cancelled. To synchronize with the host, the gadget driver needs to
> > cancel the request. Dwc3 needs to notify the gadget driver of this.
>
> No, that's not what happens in the Mass Storage Class.
>
> For the Bulk-Only Transport version of MSC, when a Windows or Linux host
> detects a transaction error, it performs a USB port reset. This clears

No, that's implementation specific for reset recovery. Typically for
Windows, for the first recovery, it sends a ClearFeature(halt_ep) and
sends a new MSC command. If the transfer doesn't complete within a
specific time, there will be a timeout and a port reset, which is
another level of recovery.

> all the state on the gadget. The gadget gets re-enumerated, and the
> host proceeds to re-issue the MSC command. The gadget driver doesn't
> need any special notifications; outstanding requests get cancelled as a
> normal part of the reset handling.
>
> (In fact, this is not what the BOT spec says to do. It says that when
> the host detects a transaction error, it should a Bulk-Only Mass Storage
> Reset -- this is a special class-specific control transfer. In
> response, the gadget driver is supposed to reset its internal state and
> cancel all of its outstanding requests. Then the host issues
> CLEAR_FEATURE(HALT) to both the bulk-IN and bulk-OUT endpoints and
> proceeds to issue its next MSC command. A lot of MSC devices don't
> handle this properly, probably because Windows didn't use this
> approach.)

At the moment, the gadget driver doesn't handle CLEAR_FEATURE(halt_ep),
the UDC driver does. I don't recall this being handled in the composite
framework or in the f_mass_storage function driver. Unless we change
this, the UDC driver needs to notify the gadget driver somehow.

>
> In the UAS version of MSC, the endpoints never halt. If there's a
> transaction error, the host simply re-issues the transaction. If that

There are multiple levels of recovery. Different driver handles it
differently. For xHCI, Initially there's retry at the packet level
(typically set to retry 3 times in a row). If it fails, host controller
driver will get a transaction error event.

In Linux xHCI, the recovery for transaction error we perform soft-reset
(xhci reset ep command with TSP=1). If it still fails, we reset the
endpoint (TSP=0) and return the request with -EPROTO to the class
driver. However, we don't send ClearFeature(halt_ep). I don't recall
Linux MSC driver handle -EPROTO and do a port reset. However it does do
a port reset due to transfer timeout.

In Windows, it doesn't do soft-reset, but it does reset endpoint (TSP=0)
and send CLEAR_FEATURE(halt_ep) without port reset initially. It then
can send the a new MSC command expecting the device to be in sync based
on the CLEAR_FEATURE(halt_ep) request. If the recovery fails and the
transfer/command timed out, it will then do a port reset to recover.

> fails too, error recovery is started by the SCSI layer; it involves a
> USB port reset.
>
> But as you can see, in each case the UDC driver doesn't have to cancel
> anything in particular when it gets a Clear-Halt.
>
> > For other class driver, it may expect the transfer to resume after data
> > sequence reset.
>
> Indeed. In which case, the UDC driver shouldn't cancel anything.
>
> > As a result, for an endpoint that's STALL (or not), and if the host
> > sends CLEAR_FEATURE(halt_ep), the dwc3 returns the request with some
> > status code and let the gadget driver handle it. If the gadget driver
> > wants to cancel the transfer, it can drop the transfer. If the gadget
> > driver wants to resume, it can requeue the same requests with the saved
> > status to resume where it left off.
>
> The UDC driver should not dequeue a request merely because the endpoint
> is halted. The gadget driver can take care of everything necessary.
> After all, it knows when an endpoint gets halted, because the gadget

No, currently it doesn't know. That's the problem. The dwc3 driver
handles the CLEAR_FEATURE(halt_ep), not the gadget driver.

> driver is what calls usb_ep_set_halt() or usb_ep_set_wedge() in the
> first place.
>
> As for handling CLEAR_FEATURE(HALT), all the UDC driver needs to do is
> clear the HALT feature for the endpoint. (Although if the endpoint is
> wedged, the HALT feature should not be cleared.) It doesn't need to
> cancel any outstanding requests or inform the gadget driver in any way.

The UDC driver needs to notify the gadget driver somehow, cancelling the
request and give it back is currently the way dwc3 handling it.

>
> (Again, this is something that a lot of USB devices don't handle
> properly. They get very confused if the host sends a Clear-Halt
> transfer for an endpoint that isn't halted.)
>
> > > Putting this together, I get the following status codes:
> > >
> > > -ESHUTDOWN Request aborted because ep was disabled
> > > -EREMOTEIO Request was for an aborted control transfer
> > > -ECONNRESET Request was cancelled by usb_ep_dequeue()
> > > -EXDEV Data dropped (isoc only)
> > > -EOVERFLOW The host sent more data than the request wanted
> > > (will never happen if the request's length is a
> > > nonzero multiple of the maxpacket size)
> > >
> > > This applies only to the .status field of struct usb_request. Calls to
> > > usb_ep_queue() may return different error codes.
> > >
> > > How does that sound?
> > >
> >
> > That looks great!
>
> At some point I'll write a patch adding this to the documentation.
>

Thanks!
Thinh