Re: [RFC v1 3/4] swiotlb: Allow dynamic allocation of bounce buffers

From: Christoph Hellwig
Date: Mon Apr 10 2023 - 23:51:41 EST


On Fri, Apr 07, 2023 at 12:46:27PM +0200, Petr Tesařík wrote:
> > b) find a way to migrate a buffer into other memory, similar to
> > how page migration works for page cache
>
> Let me express the idea in my own words to make sure I get it right.
> When a DMA buffer is imported, but before it is ultimately pinned in
> memory, the importing device driver checks whether the buffer meets its
> DMA constraints. If not, it calls a function provided by the exporting
> device driver to migrate the buffer.

Yes.

> This makes sense, but:
>
> 1) The operation must be implemented in the exporting driver; this
> will take some time.
>
> 2) In theory, there may be no overlap between the exporting device
> and the importing device. OTOH I'm not aware of any real-world
> example, so we can probably return a suitable error code, and
> that's it.

Indeed. And if there is no overlap, which as you said is indeed
very unlikely but in theory possible, we could still keep migrating
forther and back.

One important thing that we should do is to consolidate more of the
dma-buf implementation code. Right now they just seem to be a wild
mess of copy and pasted boilerplate code unfortunately.

> Anyway, I have already written in another reply that my original use
> case is moot, because a more recent distribution can do the job without
> using dma-buf, so it has been fixed in user space, be it in GNOME,
> pipewire, or Mesa (I don't really have to know).
>
> At this point I would go with the assumption that large buffers
> allocated by media subsystems will not hit swiotlb. Consequently, I
> don't plan to spend more time on this branch of the story.

Sounds fine to me, and thanks for taking the effort so far.

> > > BTW my testing also suggests that the streaming DMA API is quite
> > > inefficient, because UAS performance _improved_ with swiotlb=force.
> > > Sure, this should probably be addressed in the UAS and/or xHCI driver,
> > > but what I mean is that moving away from swiotlb may even cause
> > > performance regressions, which is counter-intuitive. At least I would
> > > _not_ have expected it.
> >
> > That is indeed very odd. Are you running with a very slow iommu
> > driver there? Or what is the actual use case there in general?
>
> This was on a Raspberry Pi 4, which does not have any IOMMU. IOW it
> looks like copying data around can be faster than sending it straight
> to the device. When I have some more time, I must investigate what is
> really happening there, because it does not make any sense to me.

If you're not using an IOMMU that doesn't actually make any sense to
me. swiotlb calls into exactly the same routines as dma-direct does
for the dma setup on each I/O, just after copying the data. So if you
do have some spare cycles to investigate what is going on here, I'd
be really curious about the results.