Re: [PATCH] arm64: dts: ti: k3-am62x-sk-common: Reserve 128MiB of global CMA

From: Devarsh Thakkar
Date: Tue Aug 08 2023 - 17:21:50 EST


Hi Andrew,

Thanks for the review.

On 07/08/23 23:30, Andrew Davis wrote:
> On 8/7/23 1:03 AM, Devarsh Thakkar wrote:
>> Hi Nishanth,
>>
>> Thanks for the review.
>>
>> On 06/08/23 01:03, Nishanth Menon wrote:
>>> On 16:44-20230803, Devarsh Thakkar wrote:
>>>> Reserve 128MiB of global CMA which is also marked as re-usable
>>>> so that OS can also use the same if peripheral drivers are not using the
>>>> same.
>>>>
>>>> AM62x supports multimedia components such as GPU, dual Display and Camera.
>>>> Assuming the worst-case scenario where all 3 are run in parallel below
>>>> is the calculation :
>>>>
>>>> 1) OV5640 camera sensor supports 1920x1080 resolution
>>>> -> 1920 width x 1080 height x 2 bytesperpixel x 8 buffers
>>>>     (default in yavta) : 32MiB
>>>>
>>>> 2) 1920x1200 Microtips LVDS panel supported
>>>> -> 1920 width x 1080 height x 4 bytesperpixel x 2 buffers :
>>>>     16 MiB
>>>>
>>>> 3) 1920x1080 HDMI display supported
>>>> -> 1920 width x 1080 height x 4 bytesperpixel x 2 buffers :
>>>>     15.82 MiB which is ~16 MiB
>>>>
>>>> 4) IMG GPU shares with display allocated buffers while rendering
>>>>     but in case some dedicated operation viz color conversion,
>>>>     keeping same window of ~16 MiB for GPU too.
>>>>
>>>> Total is 80 MiB and adding 32 MiB for other peripherals and extra
>>>> 16 MiB to keep as buffer for fragmentation thus rounding total to 128
>>>> MiB.
>>>>
>>>> Signed-off-by: Devarsh Thakkar <devarsht@xxxxxx>
>>>> Acked-by: Darren Etheridge <detheridge@xxxxxx>
>>>> Signed-off-by: Vignesh Raghavendra <vigneshr@xxxxxx>
>>>> ---
>>>
>>> I don't think this is right approach. There are other techniques
>>> than having to do this (Andrew: please comment) and require drivers to
>>> behave properly.
>>
>> Sorry but I did not understand clearly the disadvantage of this approach.
>> Here we are reserving CMA and also marking it as re-usable so that in case
>> driver is not using it OS can use that region.
>>
>

Agreed, the examples where shared just for informational purpose.

> It isn't always that easy, many types of allocations can be pinned and
> cannot be placed in this region. It still has cost.
>

I agree if some allocations are requested with non-moveable pages than cma
region can't serve those, but that looks more like a tradeoff to me as we also
want to support those drivers requiring CMA to use it on-demand and for
allocations requiring non-moveable pages they can be still served through
non-CMA region.


>> Also I see quite a few vendors already taking this approach :
>>
>
> Just because others have gotten away with it doesn't mean it is correct :)
>
> There are some cases when the DMA/CMA region needs to be in a specific
> location as the hardware only supports some addresses (only some address
> pins wired out, etc..). But general CMA size selection is a configuration
> and so has no place in DT which should only be used to describe hardware.
>

But I do see the DT specification [0] mentioning this provision for CMA size
selection, so I don't think we are violating the DT specification while
choosing to set CMA size this way :


> Another issue I have is that this forces all users of these boards to
> have this rather large carveout, even if they do not intend to use all
> of these IP at the same time, or even at all.
>
> Actually, upstream we don't support GPU yet, so you can't use all of
> this carveout anyway.
>
> Lastly, large CMA carveouts as in this case are masking a bigger issue,
> there is hardware IP that cannot handle scatter-gather and there is
> no system level IOMMU to help with that. This simply does not scale,
> fragmentation can set in even with CMA in a running system, physically
> contiguous allocations can still fail. As our devices grow in complexity
> while still not having an IOMMU we would need to reserve ever increasingly
> sized CMA areas.
>

Yes I think the CMA allocator tries to address the same scenarios and CMA
reservation is needed to support use-cases with those hardware IP's.

Also, I agree one magic value of CMA may not fit aptly all the users as some
might not intend to use these IPs at all but a default value for CMA can be
chosen to support basic use-cases supported by SoC,
and if the setting doesn't suit the user they can override this setting using
bootargs or some other method.


> This might sound like an ad absurdum argument, but we only need to look
> at our current evil vendor tree to see where this leads [0]. Yes you
> are reading the size right, 1.75GB(!) of CMA..
>

Hmm I don't have much context on large CMA there, but that board has 32Gb RAM :)

> We need a better solution upstream, I'm not claiming I know what
> that solution is (probably something involved in the memory allocation
> to allow for more/larger movable pages). But for the above 3 reasons
> this patch is not a viable solution.
>

Regarding movable pages, I think the CMA reservation uses pages allocated with
MIGRATE_CMA [1] flag which ensure that it uses only the movable pages so
reserved memory can be used by system if not used by driver and driver can
reclaim it back when it requires.

I am not sure what could be better solution than this, but I think camera and
display are primary use-cases for AM625 and base device-tree will be enabling
these peripherals by default and so the basic use-cases with them should also
by supported out-of-box. My suggestion is to go with this approach and if
somebody find better approach than this, we can update to it at that time.

The alternatives I see are having a shared dma pool between a set of
peripherals but that will require each driver using it to call
of_reserved_mem_device_init(dev) before starting allocation and prevent other
drivers (those not linked with this reserved region) from using CMA, and so I
did not prefer that approach and prefer the base approach as shared in this patch.

Kindly let me know your opinion.

[0]
https://devicetree-specification.readthedocs.io/en/latest/chapter3-devicenodes.html#:~:text=If%20a%20linux%2Ccma%2Ddefault
[1] https://elixir.bootlin.com/linux/latest/source/include/linux/mmzone.h#L62

Regards
Devarsh

> [0]
> https://git.ti.com/cgit/ti-linux-kernel/ti-linux-kernel/tree/arch/arm64/boot/dts/ti/k3-am69-sk.dts?h=ti-linux-6.1.y#n48
>
> Andrew
>
>> $grep -r cma-default arch/arm64/boot/dts
>> arch/arm64/boot/dts/qcom/sc8280xp-lenovo-thinkpad-x13s.dts:276:
>>   linux,cma-default;
>> arch/arm64/boot/dts/qcom/sc8280xp-crd.dts:222:
>> linux,cma-default;
>> arch/arm64/boot/dts/freescale/imx8mp-tqma8mpql-mba8mpxl.dts:201:
>>           linux,cma-default;
>> arch/arm64/boot/dts/freescale/imx8ulp-evk.dts:32:
>> linux,cma-default;
>> arch/arm64/boot/dts/freescale/imx8-apalis-v1.1.dtsi:198:
>>   linux,cma-default;
>> arch/arm64/boot/dts/freescale/imx8mm-tqma8mqml.dtsi:48:
>> linux,cma-default;
>> arch/arm64/boot/dts/freescale/imx8dxl-evk.dts:50:
>> linux,cma-default;
>> arch/arm64/boot/dts/freescale/imx93-tqma9352.dtsi:24:
>> linux,cma-default;
>> arch/arm64/boot/dts/freescale/imx8mq-tqma8mq.dtsi:59:
>> linux,cma-default;
>> arch/arm64/boot/dts/freescale/imx8mn-tqma8mqnl.dtsi:46:
>> linux,cma-default;
>> arch/arm64/boot/dts/freescale/imx93-11x11-evk.dts:28:
>> linux,cma-default;
>> arch/arm64/boot/dts/hisilicon/hi6220-hikey.dts:67:
>> linux,cma-default;
>> arch/arm64/boot/dts/amlogic/meson-gx.dtsi:63:
>> linux,cma-default;
>> arch/arm64/boot/dts/amlogic/meson-g12-common.dtsi:121:
>> linux,cma-default;
>> arch/arm64/boot/dts/amlogic/meson-a1.dtsi:59:
>> linux,cma-default;
>>
>>
>>   I am esp concerned since there are platforms based on
>>> am62x and just 256MB DDR.
>>>
>>
>> The file "k3-am62x-sk-common.dtsi" refers DDR memory size as 2Gb[1] and so I
>> put CMA reservation in same file assuming all boards including this file have
>> 2Gb.
>>
>> But if there are some boards having lesser DDR and including this
>> k3-am62x-sk-common.dtsi and overriding memory node, I can put the CMA
>> reservation node in board specific file i.e. k3-am625-sk.dts in V2.
>>
>> Kindly let me know if above is preferred approach.
>>
>> [1]
>> https://gitlab.com/linux-kernel/linux-next/-/blob/next-20230807/arch/arm64/boot/dts/ti/k3-am62x-sk-common.dtsi?ref_type=tags#L33
>>
>> Regards
>> Devarsh
>>
>>>>   arch/arm64/boot/dts/ti/k3-am62x-sk-common.dtsi | 8 ++++++++
>>>>   1 file changed, 8 insertions(+)
>>>>
>>>> diff --git a/arch/arm64/boot/dts/ti/k3-am62x-sk-common.dtsi
>>>> b/arch/arm64/boot/dts/ti/k3-am62x-sk-common.dtsi
>>>> index 34c8ffc553ec..9dd6e23ca9ca 100644
>>>> --- a/arch/arm64/boot/dts/ti/k3-am62x-sk-common.dtsi
>>>> +++ b/arch/arm64/boot/dts/ti/k3-am62x-sk-common.dtsi
>>>> @@ -47,6 +47,14 @@ ramoops@9ca00000 {
>>>>               pmsg-size = <0x8000>;
>>>>           };
>>>>   +        /* global cma region */
>>>> +        linux,cma {
>>>> +            compatible = "shared-dma-pool";
>>>> +            reusable;
>>>> +            size = <0x00 0x8000000>;
>>>> +            linux,cma-default;
>>>> +        };
>>>> +
>>>>           secure_tfa_ddr: tfa@9e780000 {
>>>>               reg = <0x00 0x9e780000 0x00 0x80000>;
>>>>               alignment = <0x1000>;
>>>> -- 
>>>> 2.34.1
>>>>
>>>