Re: [RFC PATCH v3 07/12] page-pool: device memory support

From: Yunsheng Lin
Date: Thu Nov 09 2023 - 04:30:56 EST


On 2023/11/9 11:20, Mina Almasry wrote:
> On Wed, Nov 8, 2023 at 2:56 AM Yunsheng Lin <linyunsheng@xxxxxxxxxx> wrote:

>
> Agreed everything above is undoable.
>
>> But we might be able to do something as folio is doing now, mm subsystem
>> is still seeing 'struct folio/page', but other subsystem like slab is using
>> 'struct slab', and there is still some common fields shared between
>> 'struct folio' and 'struct slab'.
>>
>
> In my eyes this is almost exactly what I suggested in RFC v1 and got
> immediately nacked with no room to negotiate. What we did for v1 is to
> allocate struct pages for dma-buf to make dma-bufs look like struct
> page to mm subsystem. Almost exactly what you're describing above.

Maybe the above is where we have disagreement:
Do we still need make dma-bufs look like struct page to mm subsystem?
IMHO, the answer is no. We might only need to make dma-bufs look like
struct page to net stack and page pool subsystem. I think that is already
what this pacthset is trying to do, what I am suggesting is just make
it more like 'struct page' to net stack and page pool subsystem, in order
to try to avoid most of the 'if' checking in net stack and page pool
subsystem.

> It's a no-go. I don't think renaming struct page to netmem is going to
> move the needle (it also re-introduces code-churn). What I feel like I
> learnt is that dma-bufs are not struct pages and can't be made to look
> like one, I think.
>
>> As the netmem patchset, is devmem able to reuse the below 'struct netmem'
>> and rename it to 'struct page_pool_iov'?
>
> I don't think so. For the reasons above, but also practically it
> immediately falls apart. Consider this field in netmem:
>
> + * @flags: The same as the page flags. Do not use directly.
>
> dma-buf don't have or support page-flags, and making dma-buf looks
> like they support page flags or any page-like features (other than
> dma_addr) seems extremely unacceptable to mm folks.

As far as I tell, as we limit the devmem usage in netstack, the below
is the related mm function call for 'struct page' for devmem:
page_ref_*(): page->_refcount does not need changing
page_is_pfmemalloc(): which is corresponding to page->pp_magic, and
devmem provider can set/unset it in it's 'alloc_pages'
ops.
page_to_nid(): we may need to handle it differently somewhat like this
patch does as page_to_nid() may has different implementation
based on different configuration.
page_pool_iov_put_many(): as mentioned in other thread, if net stack is not
calling page_pool_page_put_many() directly, we
can reuse napi_pp_put_page() for devmem too, and
handle the special case for devmem in 'release_page'
ops.

>
>> So that 'struct page' for normal
>> memory and 'struct page_pool_iov' for devmem share the common fields used
>> by page pool and net stack?
>
> Are you suggesting that we'd cast a netmem* to a page* and call core
> mm APIs on it? It's basically what was happening with RFC v1, where
> things that are not struct pages were made to look like struct pages.
>
> Also, there isn't much upside for what you're suggesting, I think. For
> example I can align the refcount variable in struct page_pool_iov with
> the refcount in struct page so that this works:
>
> put_page((struct page*)ppiov);
>
> but it's a disaster. Because put_page() will call __put_page() if the
> page is freed, and __put_page() will try to return the page to the
> buddy allocator!

As what I suggested above, Can we handle this in devmem provider's
'release_page' ops instead of calling put_page() directly as for devmem.

>
>> And we might be able to reuse the 'flags',
>> '_pp_mapping_pad' and '_mapcount' for specific mem provider, which is enough
>> for the devmem only requiring a single pointer to point to it's
>> owner?
>>
>
> All the above seems quite similar to RFC v1 again, using netmem
> instead of struct page. In RFC v1 we re-used zone_device_data() for
> the dma-buf owner equivalent.

As we have added a few checkings to limit 'struct page' for devmem to
be only used in net stack, we can decouple 'struct page' for devmem
from mm subsystem, zone_device_data() is not really needed, right?

If we can decouple 'struct page' for normal memory from mm subsystem
through the folio work in the future, then we may define a more abstract
structure for page pool and net stack instead of reusing 'struct page'
from mm.

>