Re: [PATCH v3 3/4] usb: gadget: functionfs: Add DMABUF import interface

From: Paul Cercueil
Date: Thu Jan 18 2024 - 14:40:00 EST


Hi Daniel / Sima,

Le jeudi 18 janvier 2024 à 14:59 +0100, Daniel Vetter a écrit :
> On Thu, Jan 18, 2024 at 02:56:31PM +0100, Daniel Vetter wrote:
> > On Mon, Jan 15, 2024 at 01:54:27PM +0100, Paul Cercueil wrote:
> > > Hi Daniel / Sima,
> > >
> > > Le mardi 09 janvier 2024 à 14:01 +0100, Daniel Vetter a écrit :
> > > > On Tue, Jan 09, 2024 at 12:06:58PM +0100, Paul Cercueil wrote:
> > > > > Hi Daniel / Sima,
> > > > >
> > > > > Le lundi 08 janvier 2024 à 20:19 +0100, Daniel Vetter a
> > > > > écrit :
> > > > > > On Mon, Jan 08, 2024 at 05:27:33PM +0100, Paul Cercueil
> > > > > > wrote:
> > > > > > > Le lundi 08 janvier 2024 à 16:29 +0100, Daniel Vetter a
> > > > > > > écrit :
> > > > > > > > On Mon, Jan 08, 2024 at 03:21:21PM +0100, Paul Cercueil
> > > > > > > > wrote:
> > > > > > > > > Hi Daniel (Sima?),
> > > > > > > > >
> > > > > > > > > Le lundi 08 janvier 2024 à 13:39 +0100, Daniel Vetter
> > > > > > > > > a
> > > > > > > > > écrit :
> > > > > > > > > > On Mon, Jan 08, 2024 at 01:00:55PM +0100, Paul
> > > > > > > > > > Cercueil
> > > > > > > > > > wrote:
> > > > > > > > > > > +static void ffs_dmabuf_signal_done(struct
> > > > > > > > > > > ffs_dma_fence
> > > > > > > > > > > *dma_fence, int ret)
> > > > > > > > > > > +{
> > > > > > > > > > > + struct ffs_dmabuf_priv *priv =
> > > > > > > > > > > dma_fence-
> > > > > > > > > > > > priv;
> > > > > > > > > > > + struct dma_fence *fence = &dma_fence-
> > > > > > > > > > > >base;
> > > > > > > > > > > +
> > > > > > > > > > > + dma_fence_get(fence);
> > > > > > > > > > > + fence->error = ret;
> > > > > > > > > > > + dma_fence_signal(fence);
> > > > > > > > > > > +
> > > > > > > > > > > + dma_buf_unmap_attachment(priv->attach,
> > > > > > > > > > > dma_fence-
> > > > > > > > > > > > sgt,
> > > > > > > > > > > dma_fence->dir);
> > > > > > > > > > > + dma_fence_put(fence);
> > > > > > > > > > > + ffs_dmabuf_put(priv->attach);
> > > > > > > > > >
> > > > > > > > > > So this can in theory take the dma_resv lock, and
> > > > > > > > > > if the
> > > > > > > > > > usb
> > > > > > > > > > completion
> > > > > > > > > > isn't an unlimited worker this could hold up
> > > > > > > > > > completion
> > > > > > > > > > of
> > > > > > > > > > future
> > > > > > > > > > dma_fence, resulting in a deadlock.
> > > > > > > > > >
> > > > > > > > > > Needs to be checked how usb works, and if stalling
> > > > > > > > > > indefinitely
> > > > > > > > > > in
> > > > > > > > > > the
> > > > > > > > > > io_complete callback can hold up the usb stack you
> > > > > > > > > > need
> > > > > > > > > > to:
> > > > > > > > > >
> > > > > > > > > > - drop a dma_fence_begin/end_signalling annotations
> > > > > > > > > > in
> > > > > > > > > > here
> > > > > > > > > > - pull out the unref stuff into a separate
> > > > > > > > > > preallocated
> > > > > > > > > > worker
> > > > > > > > > > (or at
> > > > > > > > > >   least the final unrefs for ffs_dma_buf).
> > > > > > > > >
> > > > > > > > > Only ffs_dmabuf_put() can attempt to take the
> > > > > > > > > dma_resv and
> > > > > > > > > would
> > > > > > > > > have
> > > > > > > > > to be in a worker, right? Everything else would be
> > > > > > > > > inside
> > > > > > > > > the
> > > > > > > > > dma_fence_begin/end_signalling() annotations?
> > > > > > > >
> > > > > > > > Yup. Also I noticed that unlike the iio patches you
> > > > > > > > don't
> > > > > > > > have
> > > > > > > > the
> > > > > > > > dma_buf_unmap here in the completion path (or I'm
> > > > > > > > blind?),
> > > > > > > > which
> > > > > > > > helps a
> > > > > > > > lot with avoiding trouble.
> > > > > > >
> > > > > > > They both call dma_buf_unmap_attachment() in the "signal
> > > > > > > done"
> > > > > > > callback, the only difference I see is that it is called
> > > > > > > after
> > > > > > > the
> > > > > > > dma_fence_put() in the iio patches, while it's called
> > > > > > > before
> > > > > > > dma_fence_put() here.
> > > > > >
> > > > > > I was indeed blind ...
> > > > > >
> > > > > > So the trouble is this wont work because:
> > > > > > - dma_buf_unmap_attachment() requires dma_resv_lock. This
> > > > > > is a
> > > > > > somewhat
> > > > > >   recent-ish change from 47e982d5195d ("dma-buf: Move
> > > > > >   dma_buf_map_attachment() to dynamic locking
> > > > > > specification"), so
> > > > > > maybe
> > > > > >   old kernel or you don't have full lockdep enabled to get
> > > > > > the
> > > > > > right
> > > > > >   splat.
> > > > > >
> > > > > > - dma_fence critical section forbids dma_resv_lock
> > > > > >
> > > > > > Which means you need to move this out, but then there's the
> > > > > > potential
> > > > > > cache management issue. Which current gpu drivers just
> > > > > > kinda
> > > > > > ignore
> > > > > > because it doesn't matter for current use-case, they all
> > > > > > cache
> > > > > > the
> > > > > > mapping
> > > > > > for about as long as the attachment exists. You might want
> > > > > > to do
> > > > > > the
> > > > > > same,
> > > > > > unless that somehow breaks a use-case you have, I have no
> > > > > > idea
> > > > > > about
> > > > > > that.
> > > > > > If something breaks with unmap_attachment moved out of the
> > > > > > fence
> > > > > > handling
> > > > > > then I guess it's high time to add separate cache-
> > > > > > management only
> > > > > > to
> > > > > > dma_buf (and that's probably going to be quite some wiring
> > > > > > up,
> > > > > > not
> > > > > > sure
> > > > > > even how easy that would be to do nor what exactly the
> > > > > > interface
> > > > > > should
> > > > > > look like).
> > > > >
> > > > > Ok. Then I'll just cache the mapping for now, I think.
> > > >
> > > > Yeah I think that's simplest. I did ponder a bit and I don't
> > > > think
> > > > it'd be
> > > > too much pain to add the cache-management functions for device
> > > > attachments/mappings. But it would be quite some typing ...
> > > > -Sima
> > >
> > > It looks like I actually do have some hardware which requires the
> > > cache
> > > management. If I cache the mappings in both my IIO and USB code,
> > > it
> > > works fine on my ZedBoard, but it doesn't work on my ZCU102.
> > >
> > > (Or maybe it's something else? What I get from USB in that case
> > > is a
> > > stream of zeros, I'd expect it to be more like a stream of
> > > garbage/stale data).
> > >
> > > So, change of plans; I will now unmap the attachment in the
> > > cleanup
> > > worker after the fence is signalled, and add a warning comment
> > > before
> > > the end of the fence critical section about the need to do cache
> > > management before the signal.
> > >
> > > Does that work for you?
> >
> > The trouble is, I'm not sure this works for you. If you rely on the
> > fences, and you have to do cache management in between dma
> > operations,
> > then doing the unmap somewhen later will only mostly paper over the
> > issue,
> > but not consistently.
> >
> > I think that's really bad because the bugs this will cause are very
> > hard
> > to track down and with the current infrastructure impossible to
> > fix.
> >
> > Imo cache the mappings, and then fix the cache management bug
> > properly.
> >
> > If you want an interim solution that isn't blocked on the dma-buf
> > cache
> > management api addition, the only thing that works is doing the
> > operations
> > synchronously in the ioctl call. Then you don't need fences, and
> > you can
> > guarantee that the unmap has finished before userspace proceeds.
> >
> > With the dma_fences you can't guarantee that, it's just pure luck.
>
> Maybe a follow up: Double check you really need the cache management
> between the dma operations from 2 different devices, and not for the
> cpu
> access that you then probably do to check the result.
>
> Because if the issue is just cpu access, then protecting the cpu
> access
> needs to use the begin/end_cpu_access dma-functions (or the
> corresponding
> ioctl if you use mmap from userspace) anyway, and that should sort
> out any
> issues you have for cpu access.
>
> Just to make sure we're not needlessly trying to fix something that
> isn't
> actually the problem.

I am not doing any CPU access - I'm just attaching the same DMABUF to
IIO and USB and use the new IOCTLs to transfer data.

Can I just roll my own cache management then, using
dma_sync_sg_for_cpu/device? I did a quick-and-dirty check with it, and
it seems to make things work with cached mappings.

> -Sima

Cheers,
-Paul