Re: dwc3: unusual handling of setup requests with wLength == 0

From: Thinh Nguyen
Date: Wed Aug 30 2023 - 22:44:24 EST


On Wed, Aug 30, 2023, Alan Stern wrote:
> On Wed, Aug 30, 2023 at 01:32:28AM +0000, Thinh Nguyen wrote:
> > That reminds me another thing, if the host (xhci in this case) does a
> > hard reset to the endpoint, it also resets the TRB pointer with dequeue
> > ep command. So, the transfer should not resume. It needs to be
> > cancelled. This xHCI behavior is the same for Windows and Linux.
>
> That's on the host side, right? How does this affect the gadget side?
>
> That is, cancelling a transfer on the host doesn't necessarily mean it
> has to be cancelled on the gadget. Does it have any implications at all
> for the gadget driver?

There are 2 things that needs to be in sync'ed between host and device:
1) The data sequence.
2) The transfer.

If host doesn't send CLEAR_FEATURE(halt_ep), best case scenario, the
data sequence does't match and the host issues usb reset after some
timeout because the packet won't go through. Worst case scenario, the
data sequence matches 0, and the wrong data is received causing
corruption.

If the device doesn't cancel the transfer in response to
CLEAR_FEATURE(halt_ep), it may send/receive data of a different transfer
because the host doesn't resume where it left off, causing corruption.

Base on the class protocol, the class driver and gadget driver know
what makes up a "transfer" and can appropriately cancel a transfer to
stay in sync.

>
> > > I think it should be the opposite; the class protocol should specify
> > > how to recover from errors. If for no other reason then to avoid the
> > > data duplication problem for USB-2. However, if it doesn't specify a
> > > recovery procedure then there's not much else you can do.
> >
> > Right, unfortunately that's not always the case that class protocol
> > spell out how to handle transaction error.
>
> All too true...
>
> > > But regardless, how can the gadget driver make any use of the
> > > knowledge that the UDC received a Clear-Halt? What would it do
> > > differently? If the intent is simply to clear an error condition and
> > > continue with the existing transfer, the gadget driver doesn't need to
> > > do anything.
> >
> > It's not simple to clear an error. It is to notify the gadget driver to
> > cancel the active transfer and resync with the host.
>
> How does the gadget driver sync with the host if the class protocol
> doesn't say what should be done?
>
> Also, what if there is no active transfer? That is, what if the
> transaction that got an error on the host appeared to be successful on
> the gadget and it was the last transaction in the final transfer queued
> for the endpoint? How would the UDC driver notify the gadget driver in
> this situation?

That's fine. If there's no active transfer, the gadget doesn't need to
cancel anything. As long as the host knows that the transfer did not
complete, it can retry and be in sync. For UASP, the host will send a
new MSC command to retry the failed transfer. ie. The host would
overwrite/re-read the transfer with the same transfer offset.

The problem arises if the gadget attempts to resume the incomplete
transfer.

>
> > This is observed in
> > UASP driver in Windows and how various consumer UASP devices handle it.
>
> I don't understand what you're saying here. How can you observe whether
> a transfer is cancelled in a consumer UAS device? And how does the
> consumer device resync with the host?

You can see a hang if the transfer are out of sync. If the transfer
isn't cancelled, the device would only source/sink whatever the
remaining of the previous transfer but not enough to complete the new
transfer. The new transfer is seen as incomplete from host and thus the
hang and the usb reset.

>
> > There no eqivalent of Bulk-Only Mass Storage Reset request from the
> > class protocol. We still have the USB analyzer traces for this.
>
> Can you post an example? Not necessarily in complete detail, but enough
> so that we can see what's going on.
>
> > Regardless whether the class protocol spells out how to handle the
> > transaction error, if there's transaction error, the host may send
> > CLEAR_FEATURE(halt_ep) as observed in Windows. The gadget driver needs
> > to know about it to cancel the active transfer and resync with the host.
>
> I'll be able to understand this better after seeing an example. Do you
> have any traces that were made for a High-speed connection (say, using
> a USB-2 cable)? It would probably be easier to follow than a SuperSpeed
> example.
>

Unfortunately I only have LeCroy usb analyzer traces of Gen 2x1, not for
usb2 speed. It's a bit tricky converting it to text with all the proper
info to see all the context. If my explanation isn't clear, I'll try to
figure out how to proceed.

Thanks,
Thinh