Re: [PATCH v1 1/1] ARM: Select DMA_DIRECT_REMAP to fix restricted DMA

From: Robin Murphy
Date: Thu Sep 28 2023 - 11:33:22 EST


On 28/09/2023 4:16 pm, Arnd Bergmann wrote:
On Thu, Sep 28, 2023, at 10:00, Jim Quinlan wrote:
On Thu, Sep 28, 2023 at 9:32 AM Arnd Bergmann <arnd@xxxxxxxx> wrote:

On Thu, Sep 28, 2023, at 08:07, Jim Quinlan wrote:
On Wed, Sep 27, 2023 at 7:10 PM Linus Walleij <linus.walleij@xxxxxxxxxx> wrote:

Clearly if you want to do this, surely the ARM-specific
arch/arm/mm/dma-mapping.c and arch/arm/mm/dma-mapping-nommu.c
needs to be removed at the same time?


Yes, this is the reason I used "RFC" as the fix looked too easy to be viable :-)
I debugged it enough to see that the host driver's
writes to the dma_alloc_coherent() region were not appearing in
memory, and that
led me to DMA_DIRECT_REMAP.

Usually when you see a mismatch between the data observed by the
device and the CPU, the problem is an incorrect "dma-coherent"
property in the DT: either the device is coherent and accesses
the cache but the CPU tries to bypass it because the property
is missing, or there is an extraneous property and the CPU
goes the through the cache but the devices bypasses it.

I just searched, there are no "dt-coherent" properties in our device tree.
Also, even if we did have them, wouldn't things also fail when not using
restricted DMA?

Correct, it should be independent of restricted DMA, but it might
work by chance that way even if it's still wrong. If your DT
is marked as non-coherent (note: the property to look for
is "dma-coherent", not "dt-coherent"), can you check the
datasheet of the SoC to if that is actually correct?

If the chip is designed to support high-speed devices on
PCIe, it's likely that the PCIe root complex is either coherent
with the caches, or can (and should) be configured that way
for performance reasons.

It could also be a driver bug if the device mixes up the
address spaces, e.g. passing virt_to_phys(pointer) rather
than the DMA address returned by dma_alloc_coherent().

This is an Intel 7260 part using the iwlwifi driver, I doubt it has
errors of that kind.

It's unlikely but not impossible, as the driver has some
unusual constructs, using a lot of coherent mappings that
might otherwise be streaming mappings, and relying on
dma_sync_single_for_device(..., DMA_BIDIRECTIONAL) for other
data, but without the corresponding dma_sync_single_for_cpu().
If all the testing happens on x86, this might easily lead
to a bug that only shows up on non-coherent systems but
is never seen during testing.

Probably the significant thing about restricted DMA is that it forces all streaming DMA to be bounce-buffered. That should expose busted synchronisation even more decisively than a lack of coherency. If there's no IOMMU, then testing the driver in the absence of restricted DMA but with "swiotlb=force" should confirm or disprove that.

Robin.

If the problem is not the "dma-coherent" property, can you
double-check if using a different PCIe device works, or narrow
down which specific buffer you saw get corrupted?

Arnd

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@xxxxxxxxxxxxxxxxxxx
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel