Re: DMA-buf and uncached system memory

From: Christian König
Date: Thu Jun 23 2022 - 03:00:02 EST


Am 23.06.22 um 01:34 schrieb Daniel Stone:
Hi Nicolas,

On Wed, 22 Jun 2022 at 20:39, Nicolas Dufresne <nicolas@xxxxxxxxxxxx> wrote:
Le mardi 16 février 2021 à 10:25 +0100, Daniel Vetter a écrit :
So I think if AMD also guarantees to drop clean cachelines just do the
same thing we do right now for intel integrated + discrete amd, but in
reserve. It's fragile, but it does work.
Sorry to disrupt, but if you pass V4L2 vmalloc data to Intel display driver, you
also get nice dirt on the screen. If you have a UVC webcam that produces a pixel
format compatible with your display, you can reproduce the issue quite easily
with:

gst-launch-1.0 v4l2src device=/dev/video0 ! kmssink

p.s. some frame-rate are less likely to exhibit the issue, make sure you create
movement to see it.
Right, this is because the UVC data in a vmalloc() area is not
necessarily flushed from the CPU cache, and the importer expects it
will be.

Yeah, but that is something perfectly valid for an exporter to do. So the bug is not in UVC.

The only solution I could think of (not implemented) was to detect in the
attach() call what the importers can do (with dev->coherent_dma_mask if I
recall), and otherwise flush the cache immediately and start flushing the cache
from now on signalling it for DQBUF (in vb2 workqueue or dqbuf ioctl, I don't
have an idea yet). I bet this idea is inapplicable to were you have fences, we
don't have that in v4l2.

This idea was hinted by Robert Becket (now in CC), but perhaps I picked it up
wrong, explaining it wrong, etc. I'm no expert, just noticed there wasn't really
a good plan for that, so one needs to make one up. I'm not aware oh an importer
could know how the memory was allocated by the exporter, and worst, how an
importer could figure-out that the export is going to produce buffer with hot
CPU cache (UVC driver does memcpy from USB chunks of variable size to produce a
fixed size image).
This is exactly what Christian was saying above.

Well more or less.

The exporter isn't doing anything wrong here. DMA-buf are supposed to be CPU cached and can also be cache hot.

The importer needs to be able to deal with that. Either by flushing the CPU cache manually (awkward), rejecting the DMA-buf for this use case (display scanout) or working around that inside it's driver (extra copy, different hw settings, etc...).

Regards,
Christian.


Cheers,
Daniel