Re: [PATCH] usb: gadget: f_fs: Wake up IO thread during disconnect

From: John Keeping
Date: Fri Dec 03 2021 - 06:31:09 EST


Hi Wesley,

On Thu, Dec 02, 2021 at 11:33:40PM -0800, Wesley Cheng wrote:
> On 12/2/2021 6:49 AM, John Keeping wrote:
> > On Wed, Dec 01, 2021 at 04:41:10PM +0000, John Keeping wrote:
> >> On Wed, Dec 01, 2021 at 02:02:05AM -0800, Wesley Cheng wrote:
> >>> During device disconnect or composition unbind, applications should be
> >>> notified that the endpoints are no longer enabled, so that it can take
> >>> the proper actions to handle its IO threads. Otherwise, they can be
> >>> left waiting for endpoints until EPs are re-enabled.
> >>>
> >>> Signed-off-by: Wesley Cheng <quic_wcheng@xxxxxxxxxxx>
> >>> ---
> >>> drivers/usb/gadget/function/f_fs.c | 7 +++++--
> >>> 1 file changed, 5 insertions(+), 2 deletions(-)
> >>>
> >>> diff --git a/drivers/usb/gadget/function/f_fs.c b/drivers/usb/gadget/function/f_fs.c
> >>> index 3c584da9118c..0b0747d96378 100644
> >>> --- a/drivers/usb/gadget/function/f_fs.c
> >>> +++ b/drivers/usb/gadget/function/f_fs.c
> >>> @@ -957,10 +957,12 @@ static ssize_t ffs_epfile_io(struct file *file, struct ffs_io_data *io_data)
> >>> if (file->f_flags & O_NONBLOCK)
> >>> return -EAGAIN;
> >>>
> >>> - ret = wait_event_interruptible(
> >>> - epfile->ffs->wait, (ep = epfile->ep));
> >>> + ret = wait_event_interruptible(epfile->ffs->wait,
> >>> + (ep = epfile->ep) || !epfile->ffs->func);
> >
> > I looked at this again, and doesn't this totally break the wait
> > condition?
> >
> > epfile->ep is set to non-null in ffs_func_eps_enable() which is called
> > from ffs_func_set_alt() just after ffs->func is set to non-null, and
> > then those are also set back to null at the same time.
> >
> > So the condition boils down to a || !a and this never blocks. Or am I
> > missing something?
>
> Thanks for the feedback. Hmm...yes, I get what you're saying. The
> EPfiles and func is basically being set/cleared together, so the above
> change wouldn't be any different than checking for ep != epfile->ep.
> Let me see if there's another way we can address the issue this change
> is trying to resolve.
>
> >
> >>> if (ret)
> >>> return -EINTR;
> >>> + if (!epfile->ffs->func)
> >>> + return -ENODEV;
> >>
> >> This seems strange - we are inside the case where the endpoint is not
> >> initially enabled, if we're returning ENODEV here shouldn't that happen
> >> in all cases?
> >>
> >> Beyond that, there is no locking for accessing ffs->func here;
> >> modification happens in gadget callbacks so it's guarded by the gadget
> >> core (the existing case in ffs_ep0_ioctl() looks suspicious as well).
> >>
> >> But I can't see why this change is necessary - there are event
> >> notifications through ep0 when this happens, as can be seen in the hunk
> >> below from the ffs_event_add(ffs, FUNCTIONFS_DISABLE) line. If
> >> userspace cares about this, then it can read the events from ep0.
> >>
> In short, the change is basically trying to resolve an issue in an
> application that has a separate thread handling the IO ops. When the
> USB cable is disconnected, the application would expect for this IO
> thread to be completed and exit gracefully, and restarting it on the
> next connect. However, since we are stuck in the read() it can not
> proceed further.

Did you see the other recent thread on FunctionFS [1]? It seems that
the separate I/O thread is not required and in fact there is a race here
in general even using AIO via io_submit().

[1] https://lore.kernel.org/linux-usb/CAJkDvNk4G3WJuitZa+oPuNhkrCPNiwwG-zyNET+urWVWAda+cw@xxxxxxxxxxxxxx/

> I guess in these situations, we should utilize the O_NONBLOCK file
> parameter?

Yes, using O_NONBLOCK does avoid the problems.

I'm not sure what the general solution is - right now I don't see any
way to resolve the requirements to wait for the host to connect and
handle disconnection without blocking here.

The simplest thing would be to refuse epfile I/O when FunctionFS is not
enabled, which neatly resolves the race in favour of returning an error.
But it means that applications need to wait for a FUNCTIONFS_ENABLE
event on ep0 before submitting any transfers on other endpoints, which
is a change from the current behaviour. And there's no way to know how
many applications rely on that.

So I don't think that's an option, at least not without providing some
way for userspace to opt in to the new behaviour.