Re: [Linaro-mm-sig] [PATCH v5 1/6] dma-buf: Add dma_buf_{begin,end}_access()

From: Paul Cercueil
Date: Wed Jan 24 2024 - 05:58:37 EST


Hi Christian,

Le mardi 23 janvier 2024 à 14:28 +0100, Christian König a écrit :
>  Am 23.01.24 um 14:02 schrieb Paul Cercueil:
>  
> > [SNIP]
> >  
> > >  
> > > >   
> > > > >  
> > > > > That an exporter has to call extra functions to access his
> > > > > own
> > > > > buffers
> > > > > is a complete no-go for the design since this forces
> > > > > exporters
> > > > > into
> > > > > doing extra steps for allowing importers to access their
> > > > > data.
> > > > >  
> > > >  
> > > > Then what about we add these dma_buf_{begin,end}_access(), with
> > > > only
> > > > implementations for "dumb" exporters e.g. udmabuf or the dmabuf
> > > > heaps?
> > > > And only importers (who cache the mapping and actually care
> > > > about
> > > > non-
> > > > coherency) would have to call these.
> > > >  
> > >  
> > > No, the problem is still that you would have to change all
> > > importers
> > > to
> > > mandatory use dma_buf_begin/end.
> > >
> > > But going a step back caching the mapping is irrelevant for
> > > coherency.
> > > Even if you don't cache the mapping you don't get coherency.
> > >  
> >  
> > You actually do - at least with udmabuf, as in that case
> > dma_buf_map_attachment() / dma_buf_unmap_attachment() will handle
> > cache
> > coherency when the SGs are mapped/unmapped.
> >  
>  
>  Well I just double checked the source in 6.7.1 and I can't see
> udmabuf doing anything for cache coherency in map/unmap.
>  
>  All it does is calling dma_map_sgtable() and dma_unmap_sgtable() to
> create and destroy the SG table and those are not supposed to sync
> anything to the CPU cache.
>  
>  In other words drivers usually use DMA_ATTR_SKIP_CPU_SYNC here, it's
> just that this is missing from udmabuf. 

Ok.
 
> >  
> > The problem was then that dma_buf_unmap_attachment cannot be called
> > before the dma_fence is signaled, and calling it after is already
> > too
> > late (because the fence would be signaled before the data is
> > sync'd).
> >  
>  
>  Well what sync are you talking about? CPU sync? In DMA-buf that is
> handled differently.
>  
>  For importers it's mandatory that they can be coherent with the
> exporter. That usually means they can snoop the CPU cache if the
> exporter can snoop the CPU cache.

I seem to have such a system where one device can snoop the CPU cache
and the other cannot. Therefore if I want to support it properly, I do
need cache flush/sync. I don't actually try to access the data using
the CPU (and when I do, I call the sync start/end ioctls).


>  For exporters you can implement the begin/end CPU access functions
> which allows you to implement something even if your exporting device
> can't snoop the CPU cache.

That only works if the importers call the begin_cpu_access() /
end_cpu_access(), which they don't.

 
> > Daniel / Sima suggested then that I cache the mapping and add new
> > functions to ensure cache coherency, which is what these patches
> > are
> > about.
> >  
>  
>  Yeah, I've now catched up on the latest mail. Sorry I haven't seen
> that before.
>  
>  
> >  
> >
> >  
> > >  
> > > In other words exporters are not require to call sync_to_cpu or
> > > sync_to_device when you create a mapping.
> > >
> > > What exactly is your use case here? And why does coherency
> > > matters?
> > >  
> >  
> > My use-case is, I create DMABUFs with udmabuf, that I attach to
> > USB/functionfs with the interface introduced by this patchset. I
> > attach
> > them to IIO with a similar interface (being upstreamed in
> > parallel),
> > and transfer data from USB to IIO and vice-versa in a zero-copy
> > fashion.
> >
> > This works perfectly fine as long as the USB and IIO hardware are
> > coherent between themselves, which is the case on most of our
> > boards.
> > However I do have a board (with a Xilinx Ultrascale SoC) where it
> > is
> > not the case, and cache flushes/sync are needed. So I was trying to
> > rework these new interfaces to work on that system too.
> >  
>  
>  Yeah, that sounds strongly like one of the use cases we have
> rejected so far.
>  
>  
>  
> >  
> > If this really is a no-no, then I am fine with the assumption that
> > devices sharing a DMABUF must be coherent between themselves; but
> > that's something that should probably be enforced rather than
> > assumed.
> >
> > (and I *think* there is a way to force coherency in the
> > Ultrascale's
> > interconnect - we're investigating it)
> >  
>  
>  What you can do is that instead of using udmabuf or dma-heaps is
> that the device which can't provide coherency act as exporters of the
> buffers.
>  
>  The exporter is allowed to call sync_for_cpu/sync_for_device on it's
> own buffers and also gets begin/end CPU access notfications. So you
> can then handle coherency between the exporter and the CPU.

But again that would only work if the importers would call
begin_cpu_access() / end_cpu_access(), which they don't, because they
don't actually access the data using the CPU.

Unless you mean that the exporter can call sync_for_cpu/sync_for_device
before/after every single DMA transfer so that the data appears
coherent to the importers, without them having to call
begin_cpu_access() / end_cpu_access().

In which case - this would still demultiply the complexity; my USB-
functionfs interface here (and IIO interface in the separate patchset)
are not device-specific, so I'd rather keep them importers.
 
>  If you really don't have coherency between devices then that would
> be a really new use case and we would need much more agreement on how
> to do this.

[snip]

Agreed. Desiging a good generic solution would be better.

With that said...

Let's keep it out of this USB-functionfs interface for now. The
interface does work perfectly fine on platforms that don't have
coherency problems. The coherency issue in itself really is a
tangential issue.

So I will send a v6 where I don't try to force the cache coherency -
and instead assume that the attached devices are coherent between
themselves.

But it would be even better to have a way to detect non-coherency and
return an error on attach.

Cheers,
-Paul