Re: [PATCH v3] usb: dwc3: gadget: Propagate core init errors to UDC during pullup

From: Johan Hovold
Date: Mon Jun 19 2023 - 11:07:31 EST


On Mon, Jun 19, 2023 at 06:20:43PM +0530, Krishna Kurapati PSSNV wrote:
> On 6/19/2023 12:36 PM, Johan Hovold wrote:
> > On Sun, Jun 18, 2023 at 05:39:49PM +0530, Krishna Kurapati wrote:

> >> @@ -2747,7 +2747,9 @@ static int dwc3_gadget_pullup(struct usb_gadget *g, int is_on)
> >> ret = pm_runtime_get_sync(dwc->dev);
> >> if (!ret || ret < 0) {
> >> pm_runtime_put(dwc->dev);
> >> - return 0;
> >> + if (ret < 0)
> >> + pm_runtime_set_suspended(dwc->dev);
> >
> > This bit is broken and is also not mentioned or explained in the commit
> > message. What are you trying to achieve here?
> >
> > You cannot set the state like this after runtime PM is enabled and the
> > above call will always fail.

> The reason why I an returning ret is because, when the first get_sync
> fails because of core_init failure and we return 0 instead of ret, the
> UDC thinks that controller has started successfully but we never set the
> run stop bit.

That bit is clear.

> So when we plug out the cable, the disconnect event won't
> be generated and we never send on systems like android the user space
> will never clear the UDC upon disconnect. Its a sort of mismatch between
> controller and udc.

Ok, but the controller is an error state after the resume failure. And
here you rely on user space to retry gadget activation in order to
eventually detect the disconnect event?

> Also once the first get_sync fails, the dwc->dev->power.runtime_error
> flag is set and successive calls to get_sync always return -EINVAL. In
> this situation even if UDC/configfs retry pullup, resume_common will
> never be called and we never actually start the controller or resume
> dwc->dev.
>
> By calling set_suspended, I am trying to clear the runtime_error flag so
> that the next retry to pullup will call resume_common and retry
> core_init and set run_stop.

Ok, thanks, that's the bit I was missing in the commit message.

First, I perhaps mistakingly thought pm_runtime_set_suspended() may only
be called with PM runtime disabled, but it appears it may indeed be
valid to call also after an error but with the caveat that the device
must then actually be in the suspended state.

The documentation and implementation is inconsistent here as the kernel
doc for pm_runtime_set_suspended() clearly states:

It is not valid to call this function for devices with runtime
PM enabled.

and it also looks like we'd end up with an active-child counter
imbalance if anyone actually tries to do so.

But either way, it also seems like the controller is not guaranteed to
be suspended here as pm_runtime_get_sync() may also fail after a
previous errors that have left the controller in the active state?

Also, what kind of errors would cause core_init and resume to fail here?

If this is something that you see during normal operation then this
seems to suggest that something is wrong with the runtime pm
implementation.

Note that virtually all drivers treat resume failures as fatal errors
and do not implement any recovery from that.

In fact, the only other example of this kind of usage that I could find
is also for a Qualcomm driver...

Johan