Re: dwc3: unusual handling of setup requests with wLength == 0

From: Alan Stern
Date: Fri Aug 25 2023 - 23:11:48 EST


On Sat, Aug 26, 2023 at 01:20:34AM +0000, Thinh Nguyen wrote:
> Sorry for the delay response.
>
> On Wed, Aug 23, 2023, Alan Stern wrote:
> > In uas, a -EPROTO error will cause an error status to be returned to the
> > SCSI layer, which will invoke the SCSI error handler. After enough
> > failures it will call the uas device-reset handler, and
> > uas_eh_device_reset_handler() calls usb_reset_device().
>
> From my testing with UASP, if I recall correctly, there's a 30 second
> timeout before the reset handler kicks in.

That timeout length comes from the SCSI error handler. I believe the
user can control the length by setting a sysfs attribute. Also, it
should be possible to change the uas driver to make it perform a reset
on its own right away, the way usb-storage does, without waiting for
the SCSI error handler -- if this matters.


> From the xHCI spec, it suggests to issue a CLEAR_FEATURE(halt_ep) after
> the endpoint reset (e.g. from transaction error). Whether this action
> should be taken from the class driver or from the xHCI driver, it's not
> clear. However, as you said, without Bulk-Only Mass Storage Reset
> request, the host and device will be out of sync. The response action
> taken from xHCI is independent from MSC protocol. So it would make sense
> for the UDC driver to treat CLEAR_FEATURE(halt_ep) as such and not
> expect a Bulk-Only Mass Storage Reset or the equivalent.

In USB-2, performing an endpoint reset in the host controller together
with sending a Clear-Halt is dangerous. It can lead to data
duplication.

Here's how that can happen. Suppose the data toggles on both the host
and gadget side are initially equal to 0 when a bulk-OUT transaction
occur. The host sends a DATA0 packet which the gadget receives,
causing the gadget's data toggle to change to 1. But let's say the
gadget's ACK handshake gets corrupted, causing a protocol error on the
host. So the host does an internal endpoint reset and sends a
Clear-Halt to the gadget. When the gadget processes this command, it
resets its data toggle back to 0. Now the host resends the same DATA0
packet as before -- and the gadget accepts it the second time because
its data toggle is set to 0! The duplicated data leads to
corruption. If the gadget's data toggle had remained 1 then it would
have acknowledged the duplicate DATA0 packet but otherwise ignored it,
avoiding the corruption.

I admit the probability of this happening is very low, but a more
robust error recovery procedure at the class level would prevent this
scenario.


> > > The UDC driver needs to notify the gadget driver somehow, cancelling the
> > > request and give it back is currently the way dwc3 handling it.
> >
> > And I'm saying that the UDC driver does _not_ need to notify the gadget
> > driver.
> >
> > The MSC gadget driver works just fine without any such notification.
> > Can you name any gadget driver that _does_ need a notification?
> >
>
> We were testing against UASP. The reason I added this was to mimic the
> behavior of common vendor UASP devices when it recovers from transaction
> errors in Windows.
>
> When the data sequence of a transfer is reset, the host needs to send
> CLEAR_FEATURE(halt_ep). It should be a common behavior. Since it is
> common and part of the xHCI spec, do we expect the xHCI to send this or
> do we expect the class protocol to document and handle this? At the
> moment, I expect it to be the former and expect the UDC driver to treat
> it as a common synchronization that perhaps the only synchronization
> mechanism the upper protocol depends on.

I think it should be the opposite; the class protocol should specify
how to recover from errors. If for no other reason then to avoid the
data duplication problem for USB-2. However, if it doesn't specify a
recovery procedure then there's not much else you can do.

But regardless, how can the gadget driver make any use of the
knowledge that the UDC received a Clear-Halt? What would it do
differently? If the intent is simply to clear an error condition and
continue with the existing transfer, the gadget driver doesn't need to
do anything.

Alternatively, if the procedure for clearing an error condition is
given by the class protocol then the protocol should spell out a
method for the host to inform the gadget about what it is doing --
something more than just sending a Clear-Halt.

Alan Stern