Re: [Linaro-mm-sig] [PATCH v5 1/6] dma-buf: Add dma_buf_{begin,end}_access()

From: Paul Cercueil
Date: Tue Jan 23 2024 - 08:07:08 EST


Le mardi 23 janvier 2024 à 12:52 +0100, Christian König a écrit :
> Am 23.01.24 um 11:10 schrieb Paul Cercueil:
> > Hi Christian,
> >
> > Le lundi 22 janvier 2024 à 14:41 +0100, Christian König a écrit :
> > > Am 22.01.24 um 12:01 schrieb Paul Cercueil:
> > > > Hi Christian,
> > > >
> > > > Le lundi 22 janvier 2024 à 11:35 +0100, Christian König a
> > > > écrit :
> > > > > Am 19.01.24 um 15:13 schrieb Paul Cercueil:
> > > > > > These functions should be used by device drivers when they
> > > > > > start
> > > > > > and
> > > > > > stop accessing the data of DMABUF. It allows DMABUF
> > > > > > importers
> > > > > > to
> > > > > > cache
> > > > > > the dma_buf_attachment while ensuring that the data they
> > > > > > want
> > > > > > to
> > > > > > access
> > > > > > is available for their device when the DMA transfers take
> > > > > > place.
> > > > > As Daniel already noted as well this is a complete no-go from
> > > > > the
> > > > > DMA-buf design point of view.
> > > > What do you mean "as Daniel already noted"? It was him who
> > > > suggested
> > > > this.
> > > Sorry, I haven't fully catched up to the discussion then.
> > >
> > > In general DMA-buf is build around the idea that the data can be
> > > accessed coherently by the involved devices.
> > >
> > > Having a begin/end of access for devices was brought up multiple
> > > times
> > > but so far rejected for good reasons.
> > I would argue that if it was brought up multiple times, then there
> > are
> > also good reasons to support such a mechanism.
> >
> > > That an exporter has to call extra functions to access his own
> > > buffers
> > > is a complete no-go for the design since this forces exporters
> > > into
> > > doing extra steps for allowing importers to access their data.
> > Then what about we add these dma_buf_{begin,end}_access(), with
> > only
> > implementations for "dumb" exporters e.g. udmabuf or the dmabuf
> > heaps?
> > And only importers (who cache the mapping and actually care about
> > non-
> > coherency) would have to call these.
>
> No, the problem is still that you would have to change all importers
> to
> mandatory use dma_buf_begin/end.
>
> But going a step back caching the mapping is irrelevant for
> coherency.
> Even if you don't cache the mapping you don't get coherency.

You actually do - at least with udmabuf, as in that case
dma_buf_map_attachment() / dma_buf_unmap_attachment() will handle cache
coherency when the SGs are mapped/unmapped.

The problem was then that dma_buf_unmap_attachment cannot be called
before the dma_fence is signaled, and calling it after is already too
late (because the fence would be signaled before the data is sync'd).

Daniel / Sima suggested then that I cache the mapping and add new
functions to ensure cache coherency, which is what these patches are
about.

> In other words exporters are not require to call sync_to_cpu or
> sync_to_device when you create a mapping.
>
> What exactly is your use case here? And why does coherency matters?

My use-case is, I create DMABUFs with udmabuf, that I attach to
USB/functionfs with the interface introduced by this patchset. I attach
them to IIO with a similar interface (being upstreamed in parallel),
and transfer data from USB to IIO and vice-versa in a zero-copy
fashion.

This works perfectly fine as long as the USB and IIO hardware are
coherent between themselves, which is the case on most of our boards.
However I do have a board (with a Xilinx Ultrascale SoC) where it is
not the case, and cache flushes/sync are needed. So I was trying to
rework these new interfaces to work on that system too.

If this really is a no-no, then I am fine with the assumption that
devices sharing a DMABUF must be coherent between themselves; but
that's something that should probably be enforced rather than assumed.

(and I *think* there is a way to force coherency in the Ultrascale's
interconnect - we're investigating it)

Cheers,
-Paul

> > At the very least, is there a way to check that "the data can be
> > accessed coherently by the involved devices"? So that my importer
> > can
> > EPERM if there is no coherency vs. a device that's already
> > attached.
>
> Yeah, there is functionality for this in the DMA subsystem. I've once
> created prototype patches for enforcing the same coherency approach
> between importer and exporter, but we never got around to upstream
> them.
>
>
>
> >
> > Cheers,
> > -Paul
> >
> > > That in turn is pretty much un-testable unless you have every
> > > possible
> > > importer around while testing the exporter.
> > >
> > > Regards,
> > > Christian.
> > >
> > > > > Regards,
> > > > > Christian.
> > > > Cheers,
> > > > -Paul
> > > >
> > > > > > Signed-off-by: Paul Cercueil <paul@xxxxxxxxxxxxxxx>
> > > > > >
> > > > > > ---
> > > > > > v5: New patch
> > > > > > ---
> > > > > >     drivers/dma-buf/dma-buf.c | 66
> > > > > > +++++++++++++++++++++++++++++++++++++++
> > > > > >     include/linux/dma-buf.h   | 37 ++++++++++++++++++++++
> > > > > >     2 files changed, 103 insertions(+)
> > > > > >
> > > > > > diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-
> > > > > > buf/dma-
> > > > > > buf.c
> > > > > > index 8fe5aa67b167..a8bab6c18fcd 100644
> > > > > > --- a/drivers/dma-buf/dma-buf.c
> > > > > > +++ b/drivers/dma-buf/dma-buf.c
> > > > > > @@ -830,6 +830,8 @@ static struct sg_table *
> > > > > > __map_dma_buf(struct
> > > > > > dma_buf_attachment *attach,
> > > > > >      *     - dma_buf_mmap()
> > > > > >      *     - dma_buf_begin_cpu_access()
> > > > > >      *     - dma_buf_end_cpu_access()
> > > > > > + *     - dma_buf_begin_access()
> > > > > > + *     - dma_buf_end_access()
> > > > > >      *     - dma_buf_map_attachment_unlocked()
> > > > > >      *     - dma_buf_unmap_attachment_unlocked()
> > > > > >      *     - dma_buf_vmap_unlocked()
> > > > > > @@ -1602,6 +1604,70 @@ void dma_buf_vunmap_unlocked(struct
> > > > > > dma_buf
> > > > > > *dmabuf, struct iosys_map *map)
> > > > > >     }
> > > > > >     EXPORT_SYMBOL_NS_GPL(dma_buf_vunmap_unlocked, DMA_BUF);
> > > > > >    
> > > > > > +/**
> > > > > > + * @dma_buf_begin_access - Call before any hardware access
> > > > > > from/to
> > > > > > the DMABUF
> > > > > > + * @attach: [in] attachment used for hardware
> > > > > > access
> > > > > > + * @sg_table: [in] scatterlist used for the DMA
> > > > > > transfer
> > > > > > + * @direction:  [in]    direction of DMA transfer
> > > > > > + */
> > > > > > +int dma_buf_begin_access(struct dma_buf_attachment
> > > > > > *attach,
> > > > > > + struct sg_table *sgt, enum
> > > > > > dma_data_direction dir)
> > > > > > +{
> > > > > > + struct dma_buf *dmabuf;
> > > > > > + bool cookie;
> > > > > > + int ret;
> > > > > > +
> > > > > > + if (WARN_ON(!attach))
> > > > > > + return -EINVAL;
> > > > > > +
> > > > > > + dmabuf = attach->dmabuf;
> > > > > > +
> > > > > > + if (!dmabuf->ops->begin_access)
> > > > > > + return 0;
> > > > > > +
> > > > > > + cookie = dma_fence_begin_signalling();
> > > > > > + ret = dmabuf->ops->begin_access(attach, sgt, dir);
> > > > > > + dma_fence_end_signalling(cookie);
> > > > > > +
> > > > > > + if (WARN_ON_ONCE(ret))
> > > > > > + return ret;
> > > > > > +
> > > > > > + return 0;
> > > > > > +}
> > > > > > +EXPORT_SYMBOL_NS_GPL(dma_buf_begin_access, DMA_BUF);
> > > > > > +
> > > > > > +/**
> > > > > > + * @dma_buf_end_access - Call after any hardware access
> > > > > > from/to
> > > > > > the DMABUF
> > > > > > + * @attach: [in] attachment used for hardware
> > > > > > access
> > > > > > + * @sg_table: [in] scatterlist used for the DMA
> > > > > > transfer
> > > > > > + * @direction:  [in]    direction of DMA transfer
> > > > > > + */
> > > > > > +int dma_buf_end_access(struct dma_buf_attachment *attach,
> > > > > > +        struct sg_table *sgt, enum
> > > > > > dma_data_direction dir)
> > > > > > +{
> > > > > > + struct dma_buf *dmabuf;
> > > > > > + bool cookie;
> > > > > > + int ret;
> > > > > > +
> > > > > > + if (WARN_ON(!attach))
> > > > > > + return -EINVAL;
> > > > > > +
> > > > > > + dmabuf = attach->dmabuf;
> > > > > > +
> > > > > > + if (!dmabuf->ops->end_access)
> > > > > > + return 0;
> > > > > > +
> > > > > > + cookie = dma_fence_begin_signalling();
> > > > > > + ret = dmabuf->ops->end_access(attach, sgt, dir);
> > > > > > + dma_fence_end_signalling(cookie);
> > > > > > +
> > > > > > + if (WARN_ON_ONCE(ret))
> > > > > > + return ret;
> > > > > > +
> > > > > > + return 0;
> > > > > > +}
> > > > > > +EXPORT_SYMBOL_NS_GPL(dma_buf_end_access, DMA_BUF);
> > > > > > +
> > > > > >     #ifdef CONFIG_DEBUG_FS
> > > > > >     static int dma_buf_debug_show(struct seq_file *s, void
> > > > > > *unused)
> > > > > >     {
> > > > > > diff --git a/include/linux/dma-buf.h b/include/linux/dma-
> > > > > > buf.h
> > > > > > index 8ff4add71f88..8ba612c7cc16 100644
> > > > > > --- a/include/linux/dma-buf.h
> > > > > > +++ b/include/linux/dma-buf.h
> > > > > > @@ -246,6 +246,38 @@ struct dma_buf_ops {
> > > > > >      */
> > > > > >      int (*end_cpu_access)(struct dma_buf *, enum
> > > > > > dma_data_direction);
> > > > > >    
> > > > > > + /**
> > > > > > + * @begin_access:
> > > > > > + *
> > > > > > + * This is called from dma_buf_begin_access() when
> > > > > > a
> > > > > > device driver
> > > > > > + * wants to access the data of the DMABUF. The
> > > > > > exporter
> > > > > > can use this
> > > > > > + * to flush/sync the caches if needed.
> > > > > > + *
> > > > > > + * This callback is optional.
> > > > > > + *
> > > > > > + * Returns:
> > > > > > + *
> > > > > > + * 0 on success or a negative error code on
> > > > > > failure.
> > > > > > + */
> > > > > > + int (*begin_access)(struct dma_buf_attachment *,
> > > > > > struct
> > > > > > sg_table *,
> > > > > > +     enum dma_data_direction);
> > > > > > +
> > > > > > + /**
> > > > > > + * @end_access:
> > > > > > + *
> > > > > > + * This is called from dma_buf_end_access() when a
> > > > > > device
> > > > > > driver is
> > > > > > + * done accessing the data of the DMABUF. The
> > > > > > exporter
> > > > > > can
> > > > > > use this
> > > > > > + * to flush/sync the caches if needed.
> > > > > > + *
> > > > > > + * This callback is optional.
> > > > > > + *
> > > > > > + * Returns:
> > > > > > + *
> > > > > > + * 0 on success or a negative error code on
> > > > > > failure.
> > > > > > + */
> > > > > > + int (*end_access)(struct dma_buf_attachment *,
> > > > > > struct
> > > > > > sg_table *,
> > > > > > +   enum dma_data_direction);
> > > > > > +
> > > > > >      /**
> > > > > >      * @mmap:
> > > > > >      *
> > > > > > @@ -606,6 +638,11 @@ void dma_buf_detach(struct dma_buf
> > > > > > *dmabuf,
> > > > > >     int dma_buf_pin(struct dma_buf_attachment *attach);
> > > > > >     void dma_buf_unpin(struct dma_buf_attachment *attach);
> > > > > >    
> > > > > > +int dma_buf_begin_access(struct dma_buf_attachment
> > > > > > *attach,
> > > > > > + struct sg_table *sgt, enum
> > > > > > dma_data_direction dir);
> > > > > > +int dma_buf_end_access(struct dma_buf_attachment *attach,
> > > > > > +        struct sg_table *sgt, enum
> > > > > > dma_data_direction dir);
> > > > > > +
> > > > > >     struct dma_buf *dma_buf_export(const struct
> > > > > > dma_buf_export_info
> > > > > > *exp_info);
> > > > > >    
> > > > > >     int dma_buf_fd(struct dma_buf *dmabuf, int flags);
>