Re: [PATCH] drm: gem: add an option for supporting the dma-coherent hardware.

From: Sui Jingfeng
Date: Wed Jun 07 2023 - 23:03:45 EST


Hi,

On 2023/6/7 20:19, Maxime Ripard wrote:
On Wed, Jun 07, 2023 at 06:30:01PM +0800, Sui Jingfeng wrote:
On 2023/6/7 17:36, Paul Cercueil wrote:
Hi Sui,

Le mercredi 07 juin 2023 à 13:30 +0800, Sui Jingfeng a écrit :
The single map_noncoherent member of struct drm_gem_dma_object may
not
sufficient for describing the backing memory of the GEM buffer
object.

Especially on dma-coherent systems, the backing memory is both cached
coherent for multi-core CPUs and dma-coherent for peripheral device.
Say architectures like X86-64, LoongArch64, Loongson Mips64, etc.

Whether a peripheral device is dma-coherent or not can be
implementation-dependent. The single map_noncoherent option is not
enough
to reflect real hardware anymore. For example, the Loongson LS3A4000
CPU
and LS2K2000/LS2K1000 SoC, peripheral device of such hardware
platform
allways snoop CPU's cache. Doing the allocation with
dma_alloc_coherent
function is preferred. The return buffer is cached, it should not
using
the default write-combine mapping. While with the current implement,
there
no way to tell the drm core to reflect this.

This patch adds cached and coherent members to struct
drm_gem_dma_object.
which allow driver implements to inform the core. Introducing new
mappings
while keeping the original default behavior unchanged.
Did you try to simply set the "dma-coherent" property to the device's
node?
But this approach can only be applied for the device driver with DT support.

X86-64, Loongson ls3a4000 mips64, Loongson ls3a5000 CPU typically do not
have DT support.

They using ACPI to pass parameter from the firmware to Linux kernel.

You approach will lost the effectiveness on such a case.
Not really, no. All DT support is doing is setting some generic device
parameter based on that property, but the infrastructure is very much
generic and can be used on systems without a DT.

From what I understand if you add that property then Linux will use DMA
coherent memory even though you use dma_alloc_noncoherent() and the
sync_single_for_cpu() / sync_single_for_device() are then NOPs.
>
Please do not mitigate the problems with confusing method.
It's not a confusing method, it's one of the two main API to deal with
DMA buffers. And you might disagree with Paul but there's no need to be
rude about it.

This approach not only tend to generate confusion but also
implement-dependent and arch-dependent. It's definitely problematic.


How does the dma_alloc_coherent/dma_alloc_noncoherent is a ARCH specific
thing.

Dependent on how does the arch_dma_ops is implemented.


The definition of the coherent on different ARCH has different meanings.

The definition of the wirte-combine on different ARCH has different
meanings.


The wirte-combine(uncache acceleration) on mips is non dma-coherent.
Then MIPS breaks the DMA allocation semantics. A buffer allocated with
dma_alloc_wc is supposed to be coherent.

But on arm, It seem that wirte-combine is coherent. (guaranteed by arch
implement).


I also heard using dma_alloc_coherent  to allocation the buffer for the
non-coherent doesn't hurt, but the reverse is not true.


But please do not create confusion.

software composite is faster because better cacheusing rate and

cache is faster to read.

It is faster because it is cached, not because it is non-coherent.

non-coherent is arch thing and/or driver-side thing,

it is a side effect of  using the cached mapping.
Honestly, it's not clear to me what your point or issue is.

Going back to the description in your commit log, you mention that you
want to support multiple hardware that might or might not be DMA
coherent, and thus you want to allocate a buffer with different
attributes depending on that?

Like, you say that the LS3A4000 has a coherency unit and thus doing the
allocation with dma_alloc_coherent is preferred. Preferred to what? A WC
buffer? Why?

A WC buffer is a coherent buffer that is allowed to cache writes.

It doesn't have to, and worst case scenario you're inexactly the same
case than a dma_alloc_coherent buffer.

It should left to driver to handle such a side effect. The device
driver know their device, so its the device driver's responsibility to
maintain the coherency.
Not really, no. Some driver are used across multiple SoCs and multiple
arch. It doesn't make any sense to encode this in the driver... which is
why it's in the DT in the first place, and abstracted away by the DMA
API. Like, do you really expect the amdgpu driver to know the DMA
attributes it needs to allocate a buffer from when running from a
RaspberryPi?

On loongson platform, we don't need to call
drm_fb_dma_sync_non_coherent() function, Its already guaranteed by
hardware.
And mostly guaranteed by dma_alloc_coherent. And if you wanted to call
it anyway, it would be a nop if the device is declared as coherent
already.

I think you're thinking about this backward. A buffer has mapping
attributes, and a device has hardware properties.

The driver (ie, software) will allocate a buffer with some mapping
attributes, and will assume that they are met in the rest of its code.
How they are met is an implementation detail of the hardware, and for
all the driver cares, it doesn't have to match.

You can allocate a WC buffer to use on a non-coherent device and that's
fine. You can allocate a non-coherent buffer on a coherent device and
that's fine too. The DMA API will make everything work when it needs to,
and if the hardware already provides stronger guarantees, then it will
just skip whatever is redundant.

So you need to write your driver using buffer is the most convenient for
you, and it's really all that matters at the driver level. But for that
to work, you need to flag the coherence-ness of your devices properly,
like Paul suggested.

I seems that you are right, at least at ARM world.

Thanks for you tell me this, I might be wrong, this patch may have small problems.

I think I should take more time to investigate this problem.

But I need to do more test before I can reply , thank Paul also.

Maxime

--
Jingfeng