Re: [PATCH] riscv: add ARCH_DMA_MINALIGN support

From: Arnd Bergmann
Date: Mon Aug 09 2021 - 03:49:39 EST


On Mon, Aug 9, 2021 at 8:20 AM Xianting TIan
<xianting.tian@xxxxxxxxxxxxxxxxx> wrote:
>
> >> +#define ARCH_DMA_MINALIGN L1_CACHE_BYTES
> > It's not a good idea to blindly set this for all riscv. For "coherent"
> > platforms, this is not necessary and will waste memory.
>
> I checked ARCH_DMA_MINALIGN definition, "If an architecture isn't fully
> DMA-coherent, ARCH_DMA_MINALIGN must be set".
>
> so that the memory allocator makes sure that kmalloc'ed buffer doesn't
> share a cache line with the others.
>
> Documentation/core-api/dma-api-howto.rst
>
> 2) ARCH_DMA_MINALIGN
>
> Architectures must ensure that kmalloc'ed buffer is
> DMA-safe. Drivers and subsystems depend on it. If an architecture
> isn't fully DMA-coherent (i.e. hardware doesn't ensure that data in
> the CPU cache is identical to data in main memory),
> ARCH_DMA_MINALIGN must be set so that the memory allocator
> makes sure that kmalloc'ed buffer doesn't share a cache line with
> the others. See arch/arm/include/asm/cache.h as an example.
>
> Note that ARCH_DMA_MINALIGN is about DMA memory alignment
> constraints. You don't need to worry about the architecture data
> alignment constraints (e.g. the alignment constraints about 64-bit
> objects).

The platform spec [1] says about this:

| Memory accesses by I/O masters can be coherent or non-coherent
| with respect to all hart-related caches.

So the kernel in its default configuration can not assume that DMA is
cache coherent on RISC-V. Making this configurable implies that
a kernel that is configured for cache-coherent machines can no longer
run on all hardware that follows the platform spec.

We have the same problem on arm64, where most of the server parts
are cache coherent, but the majority of the low-end embedded devices
are not, and we require that a single kernel ran run on all of the above.

One idea that we have discussed several times is to start the kernel
without the small kmalloc caches and defer their creation until a
later point in the boot process after determining whether any
non-coherent devices have been discovered. Any in-kernel structures
that have an explicit ARCH_DMA_MINALIGN alignment won't
benefit from this, but any subsequent kmalloc() calls can use the
smaller caches. The tricky bit is finding out whether /everything/ on
the system is cache-coherent or not, since we do not have a global
flag for that in the DT. See [2] for a recent discussion.

Arnd

[1] https://github.com/riscv/riscv-platform-specs/blob/main/riscv-platform-spec.adoc#architecture
[2] https://lore.kernel.org/linux-arm-kernel/20210527124356.22367-1-will@xxxxxxxxxx/