Re: [PATCH v1 1/1] ARM: Select DMA_DIRECT_REMAP to fix restricted DMA

From: Marek Szyprowski
Date: Fri Oct 20 2023 - 04:16:58 EST


Dear All,

I didn't have enough time to follow the whole discussion, but it looks I
can add some comments.

On 06.10.2023 09:40, Christoph Hellwig wrote:
> On Thu, Oct 05, 2023 at 01:53:33PM -0400, Jim Quinlan wrote:
>>> Yes, DMA_DIRECT_REMAP should only be used for platforms using the
>>> generic generic remap that plus straight into dma-direct and
>>> bypasses arch_dma_alloc.
>>>
>>> ARM first needs support to directly set the uncached/wc bits on
>>> the direct mapping for CMA, which should be fairly simple but require
>>> wide spread testing.
>>>
>>> I'd be happy to work with anyone who wants to look into this.
>> I'd like to look into this and help make it work for ARCH=arm but you
>> seem to be saying that you also need help from ARM the company?
> No, I don't care about companies. I just need someone (singular or
> plural) to test a wide range of arm systems.
>
> Here is my idea for the attack plan:
>
> As step 1 ignore the whole CMA direct map issue, and just to the
> trivial generic dma remap conversion. This should involved:
>
> - select DMA_DIRECT_REMAP
> - provide arch_dma_prep_coherent to flush out all dirty data by
> calling __dma_clear_buffer
> - remove the existing arch_dma_alloc/arch_dma_free and all their
> infrastructure
>
> With this things should work fine on any system not using CMA

This won't be that easy.

For historical reasons (performance and limitations of the pre-ARM v7
cores), on the 32bit ARM the whole kernel's direct mapping is done using
so called 'sections' (1MiB size afair). Those sections are created in
the per-process MMU page tables (there are no separate MMU table for the
kernel mappings), so altering those mappings requires updating bits in
all processes in the system. Practically this means that those mappings
has to be static once created during boot time. That's why when no CMA
is selected, the whole dma_alloc_coherent() allocations are limited to
rather small region, which is already remapped as non-cached during boot.

This is a serious limitation, that's why some other approach was needed
and it turned out that CMA can resolve that issue too.

CMA limits the DMA allocations to the specific memory regions, thus each
such region (part of the kernel's direct map) CAN be easily remapped
during boot time with standard 4K pages and then altered/updated
on-demand when coherent allocation is performed. This slightly lowers
the performance of the memory related operation on that region (access
to 4K pages is a bit slower compared to the memory mapped with
sections), but CMA is mainly used on the newer ARMv7 systems which often
have a decent cache, which mitigates such performance drop.


> Then attack the CMA direct mapping:
>
> - modify the core DMA mapping code so that the
> ARCH_HAS_DMA_SET_UNCACHED code is only used conditionally
> I'm not quite sure what the right checks and right place is,
> but the intent is that it should allow arm to only use that
> path for CMA allocations. For all existing users of
> CONFIG_ARCH_HAS_DMA_SET_UNCACHED it should evaluate to
> a compile-time true to not change the behavior or code
> generation
> - then in arm select ARCH_HAS_DMA_SET_UNCACHED and implement
> arch_dma_set_uncached, arch_dma_clear_uncached and the new
> helper above

The plan for the CMA related case sounds good.

If you need any ARM related tests, let me know. I have a bunch of ARM
based test machines, which I use for the tests of the linux-next on the
day-to-day basis.

Best regards
--
Marek Szyprowski, PhD
Samsung R&D Institute Poland