Re: [PATCH RFC 3/8] memory-provider: dmabuf devmem memory provider

From: Yunsheng Lin
Date: Thu Nov 16 2023 - 06:12:11 EST


On 2023/11/16 3:05, Mina Almasry wrote:
> On Wed, Nov 15, 2023 at 10:07 AM Mina Almasry <almasrymina@xxxxxxxxxx> wrote:
>>
>> On Wed, Nov 15, 2023 at 1:29 AM Yunsheng Lin <linyunsheng@xxxxxxxxxx> wrote:
>>>
>>> On 2023/11/14 23:41, Willem de Bruijn wrote:
>>>>>
>>>>> I am not sure dma-buf maintainer's concern is still there with this patchset.
>>>>>
>>>>> Whatever name you calling it for the struct, however you arrange each field
>>>>> in the struct, some metadata is always needed for dmabuf to intergrate into
>>>>> page pool.
>>>>>
>>>>> If the above is true, why not utilize the 'struct page' to have more unified
>>>>> handling?
>>>>
>>>> My understanding is that there is a general preference to simplify struct
>>>> page, and at the least not move in the other direction by overloading the
>>>> struct in new ways.
>>>
>>> As my understanding, the new struct is just mirroring the struct page pool
>>> is already using, see:
>>> https://elixir.free-electrons.com/linux/v6.7-rc1/source/include/linux/mm_types.h#L119
>>>
>>> If there is simplifying to the struct page_pool is using, I think the new
>>> stuct the devmem memory provider is using can adjust accordingly.
>>>
>>> As a matter of fact, I think the way 'struct page' for devmem is decoupled
>>> from mm subsystem may provide a way to simplify or decoupled the already
>>> existing 'struct page' used in netstack from mm subsystem, before this
>>> patchset, it seems we have the below types of 'struct page':
>>> 1. page allocated in the netstack using page pool.
>>> 2. page allocated in the netstack using buddy allocator.
>>> 3. page allocated in other subsystem and passed to the netstack, such as
>>> zcopy or spliced page?
>>>
>>> If we can decouple 'struct page' for devmem from mm subsystem, we may be able
>>> to decouple the above 'struct page' from mm subsystem one by one.
>>>
>>>>
>>>> If using struct page for something that is not memory, there is ZONE_DEVICE.
>>>> But using that correctly is non-trivial:
>>>>
>>>> https://lore.kernel.org/all/ZKyZBbKEpmkFkpWV@xxxxxxxx/
>>>>
>>>> Since all we need is a handle that does not leave the network stack,
>>>> a network specific struct like page_pool_iov entirely avoids this issue.
>>>
>>> Yes, I am agree about the network specific struct.
>>> I am wondering if we can make the struct more generic if we want to
>>> intergrate it into page_pool and use it in net stack.
>>>
>>>> RFC v3 seems like a good simplification over RFC v1 in that regard to me.
>>>> I was also pleasantly surprised how minimal the change to the users of
>>>> skb_frag_t actually proved to be.
>>>
>>> Yes, I am agreed about that too. Maybe we can make it simpler by using
>>> a more abstract struct as page_pool, and utilize some features of
>>> page_pool too.
>>>
>>> For example, from the page_pool doc, page_pool have fast cache and
>>> ptr-ring cache as below, but if napi_frag_unref() call
>>> page_pool_page_put_many() and return the dmabuf chunk directly to
>>> gen_pool in the memory provider, then it seems we are bypassing the
>>> below caches in the page_pool.
>>>
>>
>> I think you're just misunderstanding the code. The page recycling
>> works with my patchset. napi_frag_unref() calls napi_pp_put_page() if
>> recycle == true, and that works the same with devmem as with regular
>> pages.
>>
>> If recycle == false, we call page_pool_page_put_many() which will call
>> put_page() for regular pages and page_pool_iov_put_many() for devmem
>> pages. So, the memory recycling works exactly the same as before with
>> devmem as with regular pages. In my tests I do see the devmem being
>> recycled correctly. We are not bypassing any caches.
>>
>>
>
> Ah, taking a closer look here, the devmem recycling works for me but I
> think that's a side effect to the fact that the page_pool support I
> implemented with GVE is unusual. I currently allocate pages from the
> page_pool but do not set skb_mark_for_recycle(). The page recycling
> still happens when GVE is done with the page and calls
> page_pool_put_full_pgae(), as that eventually checks the refcount on
> the devmem and recycles it.
>
> I will fix up the GVE to call skb_mark_for_recycle() and ensure the
> napi_pp_put_page() path recycles the devmem or page correctly in the
> next version.

What about other features? Like dma mapping optimization and frag support
in the page pool.

I understand that you use some trick in the gen_gool to avoid the per chunk
dma_addr storage in the 'struct page_pool_iov' and do not need frag support
for now.

But for other memory provider, if they need those supports, we probably need
to extend 'struct page_pool_iov' to support those features, then we may have
a 'struct page_pool_iov' to be looking alike to the sepcific union for page_pool
in 'struct page', see:

https://elixir.free-electrons.com/linux/v6.7-rc1/source/include/linux/mm_types.h#L119

We currently don't have a way to decouple the existing 'struct page' from mm
subsystem yet as my understanding, if we don't mirror the above union in the
'struct page', we may have more 'if' checking added in the future.