Re: [RFC PATCH] arm64: mm: Add invalidate back in arch_sync_dma_for_device when FROM_DEVICE

From: Ard Biesheuvel
Date: Thu Nov 17 2022 - 03:24:50 EST


On Thu, 17 Nov 2022 at 07:33, Nanyong Sun <sunnanyong@xxxxxxxxxx> wrote:
>
> The commit c50f11c6196f ("arm64: mm: Don't invalidate FROM_DEVICE
> buffers at start of DMA transfer") replaces invalidate with clean
> when DMA_FROM_DEVICE, this changes the behavior of functions like
> dma_map_single() and dma_sync_single_for_device(*, *, *, DMA_FROM_DEVICE),
> then it may make some drivers works unwell because the implementation
> of these DMA APIs lose the original cache invalidation.
>
> Situation 1:
...
> Situation 2:
> After backporting the above commit, we find a network card driver go
> wrong with cache inconsistency when doing DMA transfer: CPU got the
> stale data in cache when reading DMA data received from device.

I suppose this means those drivers may lack dma_sync_single_for_cpu()
calls after the inbound transfers complete, and are instead relying on
the cache invalidation performed before the transfer to make the DMA'd
data visible to the CPU.

This is buggy and fragile, and should be fixed in any case. There is
no guarantee that the CPU will not preload parts of the buffer into
the cache while DMA is in progress, so the invalidation must occur
strictly after the device has finished transferring the data.

> A similar phenomenon happens on sata disk drivers, it involves
> mainline modules like libata, scsi, ahci etc, and is hard to find
> out which line of code results in the error.
>

Could you identify the actual hardware and drivers that you are
observing the issue on? Claiming that everything is broken is not very
helpful in narrowing it down (although I am not saying you are wrong
:-))

> It seems that some dirvers may go wrong and have to match the
> implementation changes of the DMA APIs, and it would be confused
> because the behavior of these DMA APIs on arm64 are different
> from other archs.
>
> Add invalidate back in arch_sync_dma_for_device() to keep drivers
> compatible by replace dcache_clean_poc with dcache_clean_inval_poc
> when DMA_FROM_DEVICE.
>

So notably, the patch in question removes cache invalidation *without*
clean, and what you are adding here is clean+invalidate. (Invalidation
without clean may undo the effect of, e.g., the memzero() of a secret
in memory, and so it is important that we don't add that back if we
can avoid it)

Since we won't lose the benefits of that change, incorporating
invalidation at this point should be fine: clean+invalidate shouldn't
be more expensive than clean, and [correctly written] drivers will
invalidate those lines anyway, as the data has to come from DRAM in
any case.

So unless fixing the drivers in question is feasible, this change
seems reasonable to me.


> Fixes: c50f11c6196f ("arm64: mm: Don't invalidate FROM_DEVICE buffers at start of DMA transfer")
> Signed-off-by: Nanyong Sun <sunnanyong@xxxxxxxxxx>
> ---
> arch/arm64/mm/dma-mapping.c | 5 ++++-
> 1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
> index 3cb101e8cb29..07f6a3089c64 100644
> --- a/arch/arm64/mm/dma-mapping.c
> +++ b/arch/arm64/mm/dma-mapping.c
> @@ -18,7 +18,10 @@ void arch_sync_dma_for_device(phys_addr_t paddr, size_t size,
> {
> unsigned long start = (unsigned long)phys_to_virt(paddr);
>
> - dcache_clean_poc(start, start + size);
> + if (dir == DMA_FROM_DEVICE)
> + dcache_clean_inval_poc(start, start + size);
> + else
> + dcache_clean_poc(start, start + size);
> }
>
> void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size,
> --
> 2.25.1
>