Re: [RFC][PATCH] usb: dwc3: usb: dwc3: Force stop EP0 transfers during pullup disable

From: Wesley Cheng
Date: Mon Aug 16 2021 - 15:14:01 EST


Hi Thinh,

On 8/15/2021 5:33 PM, Thinh Nguyen wrote:
> Felipe Balbi wrote:
>>
>> Hi,
>>
>> Thinh Nguyen <Thinh.Nguyen@xxxxxxxxxxxx> writes:
>>>>>>>>>>> If this occurs, then the entire pullup disable routine is skipped and
>>>>>>>>>>> proper cleanup and halting of the controller does not complete.
>>>>>>>>>>> Instead of returning an error (which is ignored from the UDC
>>>>>>>>>>> perspective), do what is mentioned in the comments and force the
>>>>>>>>>>> transaction to complete and put the ep0state back to the SETUP phase.
>>>>>>>>>>>
>>>>>>>>>>> Signed-off-by: Wesley Cheng <wcheng@xxxxxxxxxxxxxx>
>>>>>>>>>>> ---
>>>>>>>>>>> drivers/usb/dwc3/ep0.c | 4 ++--
>>>>>>>>>>> drivers/usb/dwc3/gadget.c | 6 +++++-
>>>>>>>>>>> drivers/usb/dwc3/gadget.h | 3 +++
>>>>>>>>>>> 3 files changed, 10 insertions(+), 3 deletions(-)
>>>>>>>>>>>
>>>>>>>>>>> diff --git a/drivers/usb/dwc3/ep0.c b/drivers/usb/dwc3/ep0.c
>>>>>>>>>>> index 6587394..abfc42b 100644
>>>>>>>>>>> --- a/drivers/usb/dwc3/ep0.c
>>>>>>>>>>> +++ b/drivers/usb/dwc3/ep0.c
>>>>>>>>>>> @@ -218,7 +218,7 @@ int dwc3_gadget_ep0_queue(struct usb_ep *ep, struct usb_request *request,
>>>>>>>>>>> return ret;
>>>>>>>>>>> }
>>>>>>>>>>>
>>>>>>>>>>> -static void dwc3_ep0_stall_and_restart(struct dwc3 *dwc)
>>>>>>>>>>> +void dwc3_ep0_stall_and_restart(struct dwc3 *dwc)
>>>>>>>>>>> {
>>>>>>>>>>> struct dwc3_ep *dep;
>>>>>>>>>>>
>>>>>>>>>>> @@ -1073,7 +1073,7 @@ void dwc3_ep0_send_delayed_status(struct dwc3 *dwc)
>>>>>>>>>>> __dwc3_ep0_do_control_status(dwc, dwc->eps[direction]);
>>>>>>>>>>> }
>>>>>>>>>>>
>>>>>>>>>>> -static void dwc3_ep0_end_control_data(struct dwc3 *dwc, struct dwc3_ep *dep)
>>>>>>>>>>> +void dwc3_ep0_end_control_data(struct dwc3 *dwc, struct dwc3_ep *dep)
>>>>>>>>>>> {
>>>>>>>>>>> struct dwc3_gadget_ep_cmd_params params;
>>>>>>>>>>> u32 cmd;
>>>>>>>>>>> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
>>>>>>>>>>> index 54c5a08..a0e2e4d 100644
>>>>>>>>>>> --- a/drivers/usb/dwc3/gadget.c
>>>>>>>>>>> +++ b/drivers/usb/dwc3/gadget.c
>>>>>>>>>>> @@ -2437,7 +2437,11 @@ static int dwc3_gadget_pullup(struct usb_gadget *g, int is_on)
>>>>>>>>>>> msecs_to_jiffies(DWC3_PULL_UP_TIMEOUT));
>>>>>>>>>>> if (ret == 0) {
>>>>>>>>>>> dev_err(dwc->dev, "timed out waiting for SETUP phase\n");
>>>>>>>>>>> - return -ETIMEDOUT;
>>>>>>>>>>> + spin_lock_irqsave(&dwc->lock, flags);
>>>>>>>>>>> + dwc3_ep0_end_control_data(dwc, dwc->eps[0]);
>>>>>>>>>>> + dwc3_ep0_end_control_data(dwc, dwc->eps[1]);
>>>>>>>>>>
>>>>>>>>>> End transfer command takes time, need to wait for it to complete before
>>>>>>>>>> issuing Start transfer again. Also, why restart again when it's about to
>>>>>>>>>> be disconnected.
>>>>>>>>>
>>>>>>>>> I can try without restarting it again, and see if that works. Instead
>>>>>>>>> of waiting for the command complete event, can we set the ForceRM bit,
>>>>>>>>> similar to what we do for dwc3_remove_requests()?
>>>>>>>>>
>>>>>>>>
>>>>>>>> ForceRM=1 means that the controller will ignore updating the TRBs
>>>>>>>> (including not clearing the HWO and remain transfer size). The driver
>>>>>>>> still needs to wait for the command to complete before issuing Start
>>>>>>>> Transfer command. Otherwise Start Transfer won't go through. If we know
>>>>>>>> that we're not going to issue Start Transfer any time soon, then we may
>>>>>>>> be able to get away with ignoring End Transfer command completion.
>>>>>>>>
>>>>>>>
>>>>>>> I see. Currently, in the place that we do use
>>>>>>> dwc3_ep0_end_control_data(), its followed by
>>>>>>> dwc3_ep0_stall_and_restart() which would execute start transfer. For
>>>>>>
>>>>>> That doesn't look right. You can try to see if it can recover from a
>>>>>> control write request. Often time we do control read and not write.
>>>>>> (i.e. try to End Transfer and immediately Start Transfer on the same
>>>>>> direction control endpoint).
>>>>>>
>>>>> OK, I can try, but just to clarify, I was referring to how it was being
>>>>> done in:
>>>>>
>>>>> static void dwc3_ep0_xfernotready(struct dwc3 *dwc,
>>>>> const struct dwc3_event_depevt *event)
>>>>> {
>>>>> ...
>>>>> if (dwc->ep0_expect_in != event->endpoint_number) {
>>>>> struct dwc3_ep *dep = dwc->eps[dwc->ep0_expect_in];
>>>>>
>>>>> dev_err(dwc->dev, "unexpected direction for Data Phase\n");
>>>>> dwc3_ep0_end_control_data(dwc, dep);
>>>>> dwc3_ep0_stall_and_restart(dwc);
>>>>> return;
>>>>> }
>>>>>
>>>
>>> Looking at this snippet again, it looks wrong. For control write
>>> unexpected direction, if the driver hasn't setup and started the DATA
>>> phase yet, then it's fine, but there is a problem if it did.
>>>
>>> Since dwc3_ep0_end_control_data() doesn't issue End Transfer command to
>>> ep0 due to the resource_index check, it doesn't follow the control
>>
>> IIRC resource_index is always non-zero, so the command should be
>
> No, resource_index for ep0out is 0, ep0in is 1. You can check from any
> of the driver tracepoint log for the return value of Start Transfer
> command for the resource index of ep0. There could be a mixed up with
> the undocumented return value of Set Endpoint Transfer Resource command
> before when this code was written, don't mix up with that.
>
>> triggered. If you have access to a Lecroy USB Trainer, could you script
>> this very scenario for verification?
>
> For anyone who wants to work on this, we don't need a LeCroy USB
> trainer. If you use xhci host, just modify the xhci-ring.c to queue a
> wrong direction DATA phase TRB of a particular control write request
> test, and continue with the next control requests.
>
Let me give this a try since I already have a modified (broken :)) XHCI
stack.

Thanks
Wesley Cheng
>>
>>> transfer flow model in the programming guide. This may cause
>>> dwc3_ep0_stall_and_restart() to overwrite the TRBs for the DATA phase
>>> with SETUP stage. Also, if the ep0 is already started, the driver won't
>>> issue Start Transfer command again.
>>
>>> This issue is unlikely to occur unless we see a misbehave host for
>>> control write request. Regardless, we need to fix this. I may need some
>>
>> right, it would be a misbehaving host, however databook called it out as
>> something that _can_ happen. Moreover, I have vague memories of this
>> being one of the test cases in Lecroy's USB Certification Suite.
>>
>
> Yes, it's something that can happen, and dwc3 should be able to handle
> it. If you remember which test in particular that tests this, let me
> know. I want to check how it was passed.
>
>>> time before I can create a patch and test it. If you or anyone is up to
>>> take this on, it'd be highly appreciated.
>>
>> Before we go ahead writing a patch for this, I'd really like to see
>> traces showing this failure and a minimal reproducer. The reproducer
>> would probably have to be a script for Lecroy's USB Trainer.
>>
>> Keep in mind this entire ep0 stack used to pass USBCV on every -rc and
>> major release (before I lost access to all my USB gear heh).
>>
>
> Are you referring to Ch9 USBCV? I don't recall there's a particular test
> for this.
>
> There should be a red flag whenever we see End Transfer command
> immediately follows by a Start Transfer command without any waiting for
> End Transfer completion. Though, in this case, we don't go through with
> the End Transfer for ep0 due to the resource_index check in
> dwc3_ep0_end_control_data().
>
> BR,
> Thinh
>

--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project