Re: dwc3: unusual handling of setup requests with wLength == 0

From: Thinh Nguyen
Date: Tue Aug 29 2023 - 21:34:21 EST


On Fri, Aug 25, 2023, Alan Stern wrote:
> On Sat, Aug 26, 2023 at 01:20:34AM +0000, Thinh Nguyen wrote:
> > Sorry for the delay response.
> >
> > On Wed, Aug 23, 2023, Alan Stern wrote:
> > > In uas, a -EPROTO error will cause an error status to be returned to the
> > > SCSI layer, which will invoke the SCSI error handler. After enough
> > > failures it will call the uas device-reset handler, and
> > > uas_eh_device_reset_handler() calls usb_reset_device().
> >
> > From my testing with UASP, if I recall correctly, there's a 30 second
> > timeout before the reset handler kicks in.
>
> That timeout length comes from the SCSI error handler. I believe the
> user can control the length by setting a sysfs attribute. Also, it
> should be possible to change the uas driver to make it perform a reset
> on its own right away, the way usb-storage does, without waiting for
> the SCSI error handler -- if this matters.

Sure, but my point is that the recovery for this is handled differently
compared Windows. Windows tries to recover with CLEAR_FEATURE(halt_ep)
before it gets to usb reset.

>
>
> > From the xHCI spec, it suggests to issue a CLEAR_FEATURE(halt_ep) after
> > the endpoint reset (e.g. from transaction error). Whether this action
> > should be taken from the class driver or from the xHCI driver, it's not
> > clear. However, as you said, without Bulk-Only Mass Storage Reset
> > request, the host and device will be out of sync. The response action
> > taken from xHCI is independent from MSC protocol. So it would make sense
> > for the UDC driver to treat CLEAR_FEATURE(halt_ep) as such and not
> > expect a Bulk-Only Mass Storage Reset or the equivalent.
>
> In USB-2, performing an endpoint reset in the host controller together
> with sending a Clear-Halt is dangerous. It can lead to data
> duplication.
>
> Here's how that can happen. Suppose the data toggles on both the host
> and gadget side are initially equal to 0 when a bulk-OUT transaction
> occur. The host sends a DATA0 packet which the gadget receives,
> causing the gadget's data toggle to change to 1. But let's say the
> gadget's ACK handshake gets corrupted, causing a protocol error on the
> host. So the host does an internal endpoint reset and sends a
> Clear-Halt to the gadget. When the gadget processes this command, it
> resets its data toggle back to 0. Now the host resends the same DATA0
> packet as before -- and the gadget accepts it the second time because
> its data toggle is set to 0! The duplicated data leads to
> corruption. If the gadget's data toggle had remained 1 then it would
> have acknowledged the duplicate DATA0 packet but otherwise ignored it,
> avoiding the corruption.
>
> I admit the probability of this happening is very low, but a more
> robust error recovery procedure at the class level would prevent this
> scenario.
>

This actually all the more that we should not silently ignore
CLEAR_FEATURE(halt_ep) and depend on transfer timeout and usb reset.

That reminds me another thing, if the host (xhci in this case) does a
hard reset to the endpoint, it also resets the TRB pointer with dequeue
ep command. So, the transfer should not resume. It needs to be
cancelled. This xHCI behavior is the same for Windows and Linux.

>
> > > > The UDC driver needs to notify the gadget driver somehow, cancelling the
> > > > request and give it back is currently the way dwc3 handling it.
> > >
> > > And I'm saying that the UDC driver does _not_ need to notify the gadget
> > > driver.
> > >
> > > The MSC gadget driver works just fine without any such notification.
> > > Can you name any gadget driver that _does_ need a notification?
> > >
> >
> > We were testing against UASP. The reason I added this was to mimic the
> > behavior of common vendor UASP devices when it recovers from transaction
> > errors in Windows.
> >
> > When the data sequence of a transfer is reset, the host needs to send
> > CLEAR_FEATURE(halt_ep). It should be a common behavior. Since it is
> > common and part of the xHCI spec, do we expect the xHCI to send this or
> > do we expect the class protocol to document and handle this? At the
> > moment, I expect it to be the former and expect the UDC driver to treat
> > it as a common synchronization that perhaps the only synchronization
> > mechanism the upper protocol depends on.
>
> I think it should be the opposite; the class protocol should specify
> how to recover from errors. If for no other reason then to avoid the
> data duplication problem for USB-2. However, if it doesn't specify a
> recovery procedure then there's not much else you can do.

Right, unfortunately that's not always the case that class protocol
spell out how to handle transaction error.

>
> But regardless, how can the gadget driver make any use of the
> knowledge that the UDC received a Clear-Halt? What would it do
> differently? If the intent is simply to clear an error condition and
> continue with the existing transfer, the gadget driver doesn't need to
> do anything.

It's not simple to clear an error. It is to notify the gadget driver to
cancel the active transfer and resync with the host. This is observed in
UASP driver in Windows and how various consumer UASP devices handle it.
There no eqivalent of Bulk-Only Mass Storage Reset request from the
class protocol. We still have the USB analyzer traces for this.

>
> Alternatively, if the procedure for clearing an error condition is
> given by the class protocol then the protocol should spell out a
> method for the host to inform the gadget about what it is doing --
> something more than just sending a Clear-Halt.
>

Regardless whether the class protocol spells out how to handle the
transaction error, if there's transaction error, the host may send
CLEAR_FEATURE(halt_ep) as observed in Windows. The gadget driver needs
to know about it to cancel the active transfer and resync with the host.

Of course we can just wait for the host to timeout the transfer and
issue a usb reset. While I agree that the class protocol should
spell out everything, it may not always be the case. And I think UDC
driver should be handled this way to inter-op with windows.

BR,
Thinh