Re: Bug in add_dma_entry()'s debugging code

From: Christoph Hellwig
Date: Tue Nov 28 2023 - 08:37:14 EST


On Mon, Nov 27, 2023 at 11:51:21AM -0500, Alan Stern wrote:
> The buffers in the bug report were allocated by kmalloc(). Doesn't
> kmalloc() guarantee that on architectures with non-cache-coherent DMA,
> allocations will be aligned on cache-line boundaries (or larger)? Isn't
> this what ARCH_DMA_MINALIGN and ARCH_KMALLOC_MINALIGN are supposed to
> take care of in include/linux/slab.h?

Oh. Yes, the variable alignment from kmalloc make things complicated.

> Architectures must ensure that kmalloc'ed buffer is
> DMA-safe. Drivers and subsystems depend on it. If an architecture
> isn't fully DMA-coherent (i.e. hardware doesn't ensure that data in
> the CPU cache is identical to data in main memory),
> ARCH_DMA_MINALIGN must be set so that the memory allocator
> makes sure that kmalloc'ed buffer doesn't share a cache line with
> the others. See arch/arm/include/asm/cache.h as an example.
>
> It says nothing about avoiding more than one DMA operation at a time to
> prevent overlap. Is the documentation wrong?

The documentation is a bit lacking unfortunately. Again, assuming
you actually have non-coherent mappings you'd easily break the fragile
cache line ownership if you did. That doesn't apply to x86 as-is, but
we really try to keep drivers portable.

> > This might not have an
> > effect on the particular plaform you are currently running on, but it
> > is still wrong.
>
> Who decides what is right and what is wrong in this area? Clearly you
> have a different opinion from David S. Miller, Richard Henderson, and
> Jakub Jelinek (the authors of that documentation file).

I don't think this about anyone's opinion. It's a fundamental propery of
how to manage caches in the face of non-coherent DMA.

>
> > Note that there have been various mumblings about
> > using nosnoop dma on x86, in which case you'll have the issue there
> > as well.
>
> Unless the people who implement nosnoop DMA also modify kmalloc() or
> ARCH_DMA_MINALIGN.

Or don't do it on kmalloc buffers.

> I guess the real question boils down to where the responsiblity should
> lie. Should kmalloc() guarantee that the memory regions it provides
> will always be suitable for DMA, with no overlap issues? Or should all
> the innumerable users of kmalloc() be responsible for jumping through
> hoops to request arch-specific minimum alignment for their DMA buffers?

I'd actually go one step back:

1) for not cache coherent DMA you can't do overlapping operations inside
a cache line
2) dma-debug is intended to find DMA API misuses, even if they don't
have bad effects on your particular system
3) the fact that the kmalloc implementation returns differently aligned
memory depending on the platform breaks how dma-debug works currently

The logical confcusion from that would be that IFF dma-debug is enabled on
any platform we need to set ARCH_DMA_MINALIGN to the cache line size.

BUT: we're actually reduzing our dependency on ARCH_DMA_MINALIGN by
moving to bounce buffering unaligned memory for non-coherent
architectures, which makes this even more complicated. Right now I
don't have a good idea how to actually deal with having the cachline
sanity checks with that, but I'm Ccing some of the usual suspects if
they have a better idea.