Re: [PATCH 3/4] dma-buf: add support for mapping with dma mapping attributes

From: Liam Mark
Date: Tue Jan 22 2019 - 17:47:09 EST


On Tue, 22 Jan 2019, Andrew F. Davis wrote:

> On 1/21/19 4:18 PM, Liam Mark wrote:
> > On Mon, 21 Jan 2019, Andrew F. Davis wrote:
> >
> >> On 1/21/19 2:20 PM, Liam Mark wrote:
> >>> On Mon, 21 Jan 2019, Andrew F. Davis wrote:
> >>>
> >>>> On 1/21/19 1:44 PM, Liam Mark wrote:
> >>>>> On Mon, 21 Jan 2019, Christoph Hellwig wrote:
> >>>>>
> >>>>>> On Sat, Jan 19, 2019 at 08:50:41AM -0800, Laura Abbott wrote:
> >>>>>>>> And who is going to decide which ones to pass? And who documents
> >>>>>>>> which ones are safe?
> >>>>>>>>
> >>>>>>>> I'd much rather have explicit, well documented dma-buf flags that
> >>>>>>>> might get translated to the DMA API flags, which are not error checked,
> >>>>>>>> not very well documented and way to easy to get wrong.
> >>>>>>>>
> >>>>>>>
> >>>>>>> I'm not sure having flags in dma-buf really solves anything
> >>>>>>> given drivers can use the attributes directly with dma_map
> >>>>>>> anyway, which is what we're looking to do. The intention
> >>>>>>> is for the driver creating the dma_buf attachment to have
> >>>>>>> the knowledge of which flags to use.
> >>>>>>
> >>>>>> Well, there are very few flags that you can simply use for all calls of
> >>>>>> dma_map*. And given how badly these flags are defined I just don't want
> >>>>>> people to add more places where they indirectly use these flags, as
> >>>>>> it will be more than enough work to clean up the current mess.
> >>>>>>
> >>>>>> What flag(s) do you want to pass this way, btw? Maybe that is where
> >>>>>> the problem is.
> >>>>>>
> >>>>>
> >>>>> The main use case is for allowing clients to pass in
> >>>>> DMA_ATTR_SKIP_CPU_SYNC in order to skip the default cache maintenance
> >>>>> which happens in dma_buf_map_attachment and dma_buf_unmap_attachment. In
> >>>>> ION the buffers aren't usually accessed from the CPU so this allows
> >>>>> clients to often avoid doing unnecessary cache maintenance.
> >>>>>
> >>>>
> >>>> How can a client know that no CPU access has occurred that needs to be
> >>>> flushed out?
> >>>>
> >>>
> >>> I have left this to clients, but if they own the buffer they can have the
> >>> knowledge as to whether CPU access is needed in that use case (example for
> >>> post-processing).
> >>>
> >>> For example with the previous version of ION we left all decisions of
> >>> whether cache maintenance was required up to the client, they would use
> >>> the ION cache maintenance IOCTL to force cache maintenance only when it
> >>> was required.
> >>> In these cases almost all of the access was being done by the device and
> >>> in the rare cases CPU access was required clients would initiate the
> >>> required cache maintenance before and after the CPU access.
> >>>
> >>
> >> I think we have different definitions of "client", I'm talking about the
> >> DMA-BUF client (the importer), that is who can set this flag. It seems
> >> you mean the userspace application, which has no control over this flag.
> >>
> >
> > I am also talking about dma-buf clients, I am referring to both the
> > userspace and kernel component of the client. For example our Camera ION
> > client has both a usersapce and kernel component and they have ION
> > buffers, which they control the access to, which may or may not be
> > accessed by the CPU in certain uses cases.
> >
>
> I know they often work together, but for this discussion it would be
> good to keep kernel clients and usperspace clients separate. There are
> three types of actors at play here, userspace clients, kernel clients,
> and exporters.
>
> DMA-BUF only provides the basic sync primitive + mmap directly to
> userspace,

Well dma-buf does provide dma_buf_kmap/dma_buf_begin_cpu_access which
allows the same fucntionality in the kernel, but I don't think that changes
your argument.

> both operations are fulfilled by the exporter. This patch is
> about adding more control to the kernel side clients. The kernel side
> clients cannot know what userspace or other kernel side clients have
> done with the buffer, *only* the exporter has the whole picture.
>
> Therefor neither type of client should be deciding if the CPU needs
> flushed or not, only the exporter, based on the type of buffer, the
> current set attachments, and previous actions (is this first attachment,
> CPU get access in-between, etc...) can make this decision.
>
> You goal seems to be to avoid unneeded CPU side CMOs when a device
> detaches and another attaches with no CPU access in-between, right?
> That's reasonable to me, but it must be the exporter who keeps track and
> skips the CMO. This patch allows the client to tell the exporter the CMO
> is not needed and that is not safe.
>

I agree it would be better have this logic in the exporter, but I just
haven't heard an upstreamable way to make that work.
But maybe to explore that a bit more.

If we consider having CPU access with no devices attached a legitimate use
case:

The pipelining use case I am thinking of is
1) dev 1 attach, map, access, unmap
2) dev 1 detach
3) (maybe) CPU access
4) dev 2 attach
5) dev 2 map, access
6) ...

It would be unfortunate to not consider this something legitimate for
userspace to do in a pipelining use case.
Requiring devices to stay attached doesn't seem very clean to me as there
isn't necessarily a nice place to tell them when to detach.

If we considered the above a supported use case I think we could support
it in dma-buf (based on past discussions) if we had 2 things

#1 if we tracked the state of the buffer (example if it has had a previous
cached/uncached write and no following CMO). Then when either the CPU or
a device was going to access a buffer it could decide, based on the
previous access if any CMO needs to be applied first.

#2 we had a non-architecture specific way to apply cache maintenance
without a device, so that in step #3 the begin_cpu_acess call could
successfully invalidate the buffer.

I think #1 is doable since we can tell tell if devices are IO coherent or
not and we know the direction of accesses in dma map and begin cpu access.

I think we would probably agree that #2 is a problem though, getting the
kernel to expose that API seems like a hard argument.

Liam

Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project