Re: [PATCHv5 06/14] remoteproc/omap: Initialize and assign reserved memory node

From: Andrew F. Davis
Date: Thu Jan 30 2020 - 18:04:58 EST


On 1/30/20 5:06 PM, Suman Anna wrote:
> On 1/30/20 3:57 PM, Suman Anna wrote:
>> On 1/30/20 3:50 PM, Andrew F. Davis wrote:
>>> On 1/30/20 4:39 PM, Suman Anna wrote:
>>>> On 1/30/20 3:19 PM, Andrew F. Davis wrote:
>>>>> On 1/30/20 3:39 PM, Suman Anna wrote:
>>>>>> On 1/30/20 2:22 PM, Andrew F. Davis wrote:
>>>>>>> On 1/30/20 2:55 PM, Suman Anna wrote:
>>>>>>>> On 1/30/20 1:42 PM, Tero Kristo wrote:
>>>>>>>>> On 30/01/2020 21:20, Andrew F. Davis wrote:
>>>>>>>>>> On 1/30/20 2:18 PM, Tero Kristo wrote:
>>>>>>>>>>> On 30/01/2020 20:11, Andrew F. Davis wrote:
>>>>>>>>>>>> On 1/16/20 8:53 AM, Tero Kristo wrote:
>>>>>>>>>>>>> From: Suman Anna <s-anna@xxxxxx>
>>>>>>>>>>>>>
>>>>>>>>>>>>> The reserved memory nodes are not assigned to platform devices by
>>>>>>>>>>>>> default in the driver core to avoid the lookup for every platform
>>>>>>>>>>>>> device and incur a penalty as the real users are expected to be
>>>>>>>>>>>>> only a few devices.
>>>>>>>>>>>>>
>>>>>>>>>>>>> OMAP remoteproc devices fall into the above category and the OMAP
>>>>>>>>>>>>> remoteproc driver _requires_ specific CMA pools to be assigned
>>>>>>>>>>>>> for each device at the moment to align on the location of the
>>>>>>>>>>>>> vrings and vring buffers in the RTOS-side firmware images. So,
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Same comment as before, this is a firmware issue for only some
>>>>>>>>>>>> firmwares
>>>>>>>>>>>> that do not handle being assigned vring locations correctly and instead
>>>>>>>>>>>> hard-code them.
>>>>>>>>
>>>>>>>> As for this statement, this can do with some updating. Post 4.20,
>>>>>>>> because of the lazy allocation scheme used for carveouts including the
>>>>>>>> vrings, the resource tables now have to use FW_RSC_ADDR_ANY and will
>>>>>>>> have to wait for the vdev synchronization to happen.
>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> I believe we discussed this topic in length in previous version but
>>>>>>>>>>> there was no conclusion on it.
>>>>>>>>>>>
>>>>>>>>>>> The commit desc might be a bit misleading, we are not actually forced to
>>>>>>>>>>> use specific CMA buffers, as we use IOMMU to map these to device
>>>>>>>>>>> addresses. For example IPU1/IPU2 use internally exact same memory
>>>>>>>>>>> addresses, iommu is used to map these to specific CMA buffer.
>>>>>>>>>>>
>>>>>>>>>>> CMA buffers are mostly used so that we get aligned large chunk of memory
>>>>>>>>>>> which can be mapped properly with the limited IOMMU OMAP family of chips
>>>>>>>>>>> have. Not sure if there is any sane way to get this done in any other
>>>>>>>>>>> manner.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Why not use the default CMA area?
>>>>>>>>>
>>>>>>>>> I think using default CMA area getting the actual memory block is not
>>>>>>>>> guaranteed and might fail. There are other users for the memory, and it
>>>>>>>>> might get fragmented at the very late phase we are grabbing the memory
>>>>>>>>> (omap remoteproc driver probe time.) Some chunks we need are pretty large.
>>>>>>>>>
>>>>>>>>> I believe I could experiment with this a bit though and see, or Suman
>>>>>>>>> could maybe provide feedback why this was designed initially like this
>>>>>>>>> and why this would not be a good idea.
>>>>>>>>
>>>>>>>> I have given some explanation on this on v4 as well, but if it is not
>>>>>>>> clear, there are restrictions with using default CMA. Default CMA has
>>>>>>>> switched to be assigned from the top of the memory (higher addresses,
>>>>>>>> since 3.18 IIRC), and the MMUs on IPUs and DSPs can only address
>>>>>>>> 32-bits. So, we cannot blindly use the default CMA pool, and this will
>>>>>>>> definitely not work on boards > 2 GB RAM. And, if you want to add in any
>>>>>>>> firewall capability, then specific physical addresses becomes mandatory.
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> If you need 32bit range allocations then
>>>>>>> dma_set_mask(dev, DMA_BIT_MASK(32));
>>>>>>>
>>>>>>> I'm not saying don't have support for carveouts, just make them
>>>>>>> optional, keystone_remoteproc.c does this:
>>>>>>>
>>>>>>> if (of_reserved_mem_device_init(dev))
>>>>>>> dev_warn(dev, "device does not have specific CMA pool\n");
>>>>>>>
>>>>>>> There doesn't even needs to be a warning but that is up to you.
>>>>>>
>>>>>> It is not exactly an apples to apples comparison. K2s do not have MMUs,
>>>>>> and most of our firmware images on K2 are actually running out of the
>>>>>> DSP internal memory.
>>>>>>
>>>>>
>>>>>
>>>>> So again we circle back to it being a firmware issue, if K2 can get away
>>>>> without needing carveouts and it doesn't even have an MMU then certainly
>>>>> OMAP/DRA7x class devices can handle it even better given they *do* have
>>>>> an IOMMU. Unless someone is hard-coding the IOMMU configuration.. In
>>>>> which case we are still just hacking around the problem here with
>>>>> mandatory specific address memory carveouts.
>>>>
>>>> Optional carveouts on OMAP remoteprocs can be an enhancement in the
>>>> future, but at the moment, we won't be able to run use-cases without
>>>> this. And I have already given some of the reasons for the same here and
>>>> on v4.
>>>>
>>>
>>>
>>> No reason to be dismissive, my questions are valid.
>>>
>>> What "use-cases" are we talking about, I have firmware that doesn't need
>>> specific carved-out addresses.
>>
>> I think you are well aware of all the usecases we provide with the TI
>> SDKs with IPUs and DSPs. And what is the firmware that you have and what
>> do you use it for?
>>


Yes I know exactly the pieces of TI firmware we are talking about and
why it they still need carveouts. That's not the point, our firmware may
have issues and hard-coding, but we need to allow for correctly built
firmware that doesn't need carveouts also. This driver should not fail
if a carveout is not provided. The remoteproc can run fine without a
carveout, only some firmwares cannot, so it should be optional and not
forced in DT on everyone using our DSP/IPU.


>> If you have misbehaving firmware that
>>> needs statically carved out memory addresses then you can have carveouts
>>> if you want, but it should be optional.
>> If I don't want to pollute my
>>> system's memory space with a bunch of carveout holes then I shouldn't
>>> have to just because your specific firmware needs them.
>>
>> Further follow-up series like early-boot and late-attach will mandate
>> fixed carveouts actually. You cannot just run out of any random memory.
>


Those are different, the location of the loaded firmware in memory will
need to be carved out if it is in use by a remote core before Linux
boots. This carveout is for Linux to allocate from to load the Vrings
and other memory it may need. When late-attach shows up then we can
think about how to handle those.


> Also, these are CMA pools ("reusable"), so they are not actual carveout
> holes ("no-map"). This is the preferred method in remoteproc mode so
> that the memory is available for kernel when remoteprocs are not in use.
> Customers can always choose to make these carveouts so that they do not
> run into memory allocation issues when changing firmwares and under
> stress conditions. These will have to be carveouts for early-boot usecases.
>


Even "reusable" carveouts can only be used by re-locateable memory
(caches and such) so still not a good thing to have your memory space
full of them.

> Customers can always choose to make these carveouts

That is exactly what I am saying, they can choose, but it should be
optional, the current binding and driver make them mandatory or the
driver will not probe.

Andrew


> regards
> Suman
>
>>
>> regards
>> Suman
>>
>>>
>>> Andrew
>>>
>>>
>>>> regards
>>>> Suman
>>>>
>>>>>
>>>>> Andrew
>>>>>
>>>>>
>>>>>> regards
>>>>>> Suman
>>>>>>
>>>>>>>
>>>>>>> Andrew
>>>>>>>
>>>>>>>
>>>>>>>> regards
>>>>>>>> Suman
>>>>>>>>
>>>>>>>>>
>>>>>>>>> -Tero
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Andrew
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> -Tero
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> This is not a requirement of the remote processor itself and so it
>>>>>>>>>>>> should not fail to probe if a specific memory carveout isn't given.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> use the of_reserved_mem_device_init/release() API appropriately
>>>>>>>>>>>>> to assign the corresponding reserved memory region to the OMAP
>>>>>>>>>>>>> remoteproc device. Note that only one region per device is
>>>>>>>>>>>>> allowed by the framework.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Signed-off-by: Suman Anna <s-anna@xxxxxx>
>>>>>>>>>>>>> Signed-off-by: Tero Kristo <t-kristo@xxxxxx>
>>>>>>>>>>>>> Reviewed-by: Bjorn Andersson <bjorn.andersson@xxxxxxxxxx>
>>>>>>>>>>>>> ---
>>>>>>>>>>>>> v5: no changes
>>>>>>>>>>>>>
>>>>>>>>>>>>> ÂÂ drivers/remoteproc/omap_remoteproc.c | 12 +++++++++++-
>>>>>>>>>>>>> ÂÂ 1 file changed, 11 insertions(+), 1 deletion(-)
>>>>>>>>>>>>>
>>>>>>>>>>>>> diff --git a/drivers/remoteproc/omap_remoteproc.c
>>>>>>>>>>>>> b/drivers/remoteproc/omap_remoteproc.c
>>>>>>>>>>>>> index 0846839b2c97..194303b860b2 100644
>>>>>>>>>>>>> --- a/drivers/remoteproc/omap_remoteproc.c
>>>>>>>>>>>>> +++ b/drivers/remoteproc/omap_remoteproc.c
>>>>>>>>>>>>> @@ -17,6 +17,7 @@
>>>>>>>>>>>>> ÂÂ #include <linux/module.h>
>>>>>>>>>>>>> ÂÂ #include <linux/err.h>
>>>>>>>>>>>>> ÂÂ #include <linux/of_device.h>
>>>>>>>>>>>>> +#include <linux/of_reserved_mem.h>
>>>>>>>>>>>>> ÂÂ #include <linux/platform_device.h>
>>>>>>>>>>>>> ÂÂ #include <linux/dma-mapping.h>
>>>>>>>>>>>>> ÂÂ #include <linux/remoteproc.h>
>>>>>>>>>>>>> @@ -480,14 +481,22 @@ static int omap_rproc_probe(struct
>>>>>>>>>>>>> platform_device *pdev)
>>>>>>>>>>>>> ÂÂÂÂÂÂ if (ret)
>>>>>>>>>>>>> ÂÂÂÂÂÂÂÂÂÂ goto free_rproc;
>>>>>>>>>>>>> ÂÂ +ÂÂÂ ret = of_reserved_mem_device_init(&pdev->dev);
>>>>>>>>>>>>> +ÂÂÂ if (ret) {
>>>>>>>>>>>>> +ÂÂÂÂÂÂÂ dev_err(&pdev->dev, "device does not have specific CMA
>>>>>>>>>>>>> pool\n");
>>>>>>>>>>>>> +ÂÂÂÂÂÂÂ goto free_rproc;
>>>>>>>>>>>>> +ÂÂÂ }
>>>>>>>>>>>>> +
>>>>>>>>>>>>> ÂÂÂÂÂÂ platform_set_drvdata(pdev, rproc);
>>>>>>>>>>>>> ÂÂ ÂÂÂÂÂ ret = rproc_add(rproc);
>>>>>>>>>>>>> ÂÂÂÂÂÂ if (ret)
>>>>>>>>>>>>> -ÂÂÂÂÂÂÂ goto free_rproc;
>>>>>>>>>>>>> +ÂÂÂÂÂÂÂ goto release_mem;
>>>>>>>>>>>>> ÂÂ ÂÂÂÂÂ return 0;
>>>>>>>>>>>>> ÂÂ +release_mem:
>>>>>>>>>>>>> +ÂÂÂ of_reserved_mem_device_release(&pdev->dev);
>>>>>>>>>>>>> ÂÂ free_rproc:
>>>>>>>>>>>>> ÂÂÂÂÂÂ rproc_free(rproc);
>>>>>>>>>>>>> ÂÂÂÂÂÂ return ret;
>>>>>>>>>>>>> @@ -499,6 +508,7 @@ static int omap_rproc_remove(struct
>>>>>>>>>>>>> platform_device *pdev)
>>>>>>>>>>>>> ÂÂ ÂÂÂÂÂ rproc_del(rproc);
>>>>>>>>>>>>> ÂÂÂÂÂÂ rproc_free(rproc);
>>>>>>>>>>>>> +ÂÂÂ of_reserved_mem_device_release(&pdev->dev);
>>>>>>>>>>>>> ÂÂ ÂÂÂÂÂ return 0;
>>>>>>>>>>>>> ÂÂ }
>>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --Â
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki.
>>>>>>>>> Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki
>>>>>>>>
>>>>>>
>>>>
>>
>