Re: [PATCH v2] usb: gadget: f_fs: Use stream_open() for endpoint files

From: Pavan Kondeti
Date: Thu Nov 11 2021 - 22:17:44 EST


Hi Greg,

On Thu, Nov 11, 2021 at 02:12:28PM +0100, Greg Kroah-Hartman wrote:
> On Thu, Nov 11, 2021 at 05:45:56PM +0530, Pavankumar Kondeti wrote:
> > Function fs endpoint files does not have the notion of file position.
> > So switch to stream like functionality. This allows concurrent threads
> > to be blocked in the ffs read/write operations which use ffs_mutex_lock().
> > The ffs mutex lock deploys interruptible wait. Otherwise, threads are
> > blocking for the mutex lock in __fdget_pos(). For whatever reason, ff the
> > host does not send/receive data for longer time, hung task warnings
> > are observed.
>
> So the current code is broken? What commit caused it to break?

This is not a serious bug that can affect functionality. if hung_task_panic
sysctl is not enabled, probably nobody would notice this except an obscure
warning in the kernel dmesg log. It is all about the task state while
it is blocked for I/O. The function fs code uses interruptible wait but
we are not reaching there and getting blocked at VFS layer due to the below
commit introduced from 3.14 kernel.

commit 9c225f2655e36a470c4f58dbbc99244c5fc7f2d4
Author: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
Date: Mon Mar 3 09:36:58 2014 -0800

vfs: atomic f_pos accesses as per POSIX

Our write() system call has always been atomic in the sense that you get
the expected thread-safe contiguous write, but we haven't actually
guaranteed that concurrent writes are serialized wrt f_pos accesses, so
threads (or processes) that share a file descriptor and use "write()"
concurrently would quite likely overwrite each others data.

We have uncovered this issue via customer bug report which happens very rarely.
It only happens like when host does not pull the data for a very long time.
Since function fs does not care about file position, thought stream_open()
is the right thing to do here.

>
> Doesn't this change cause a change in behavior for existing userspace
> tools, or will they still work as-is?
>

I don't think it affects user space as it just changes the task state from
UNINTERRUPTIBLE to INTERRUPTIBLE while waiting for the USB transfers to
finish. However there is a slight change to the O_NONBLOCK behavior.
Earlier threads that are using O_NONBLOCK are also getting blocked
inside fdget_pos(). Now they reach to f_fs and error code is returned. IOW,
we are actually fixing the non blocking behavior here.

PS: I believe you asked these questions since the commit description does not
cover it. I can happily add all this information to it. Since it is all historical,
I did not mention it.

Thanks,
Pavan