Re: [PATCH net-next v4 4/5] page_pool: remove PP_FLAG_PAGE_FRAG flag

From: Jesper Dangaard Brouer
Date: Fri Jun 16 2023 - 15:01:13 EST




On 16/06/2023 17.01, Alexander Duyck wrote:
On Fri, Jun 16, 2023 at 5:21 AM Yunsheng Lin <linyunsheng@xxxxxxxxxx> wrote:

On 2023/6/16 2:26, Alexander Duyck wrote:
On Thu, Jun 15, 2023 at 9:51 AM Jakub Kicinski <kuba@xxxxxxxxxx> wrote:

On Thu, 15 Jun 2023 15:17:39 +0800 Yunsheng Lin wrote:
[...]

I like your patches as they isolate the drivers from having to make the
fragmentation decisions based on the system page size (4k vs 64k but
we're hearing more and more about ARM w/ 16k pages). For that use case
this is great.

+1

[...]

In the case of the standard page size being 4K a standard page would
just have to take on the CPU overhead of the atomic_set and
atomic_read for pp_ref_count (new name) which should be minimal as on
most sane systems those just end up being a memory write and read.

If I understand you correctly, I think what you are trying to do
may break some of Jesper' benchmarking:)

[1] https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/lib/bench_page_pool_simple.c

So? If it breaks an out-of-tree benchmark the benchmark can always be
fixed.

It doesn't matter if this is out-of-tree (I should have upstreamed it when AKPM asked me to.)

Point is don't break my page_pool fast-path!!! :-P

The point is enabling a use case that can add value across the
board instead of trying to force the community to support a niche use
case.

I'm all for creating a new API, lets call it netmem, that takes care of this use-case.
I'm *not* okay with this new API slowing down the page_pool fast-path.

Why not multiplex on a MEM_TYPE, like XDP_MEM_TYPE is prepared for?!?
Meaning the caller can choose which is the correct API call.
(thus, we can stay away from adding code to fast-path case)

See below, copy-paste of code that shows what I mean by multiplex on a MEM_TYPE.


Ideally we should get away from using the pages directly for most
cases in page pool. In my mind the page pool should start operating
more like __get_free_pages where what you get is a virtual address
instead of the actual page. That way we could start abstracting it
away and eventually get to something more like a true page_pool api
instead of what feels like a set of add-ons for the page allocator.

Yes, I agree with Alex Duyck here.
Like when I looked at veth proposed changes, it also felt like a virtual address would be better than a page.

addr = netmem_alloc(rq->page_pool, &truesize);

Although at the end of the day this still feels more like we are just
reimplementing slab so it is hard for me to say this is necessarily
the best solution either.

Yes, we have to be careful not to re-implement the MM layer in network land ;-)

(below code copy-paste broke whitespaces)

$ git show
commit fe38c642d629f8361f76b25aa8732e5e331d0925 (HEAD -> pp_rm_workqueue04)
Author: Jesper Dangaard Brouer <brouer@xxxxxxxxxx>
Date: Fri Jun 16 20:54:08 2023 +0200

page_pool: code examplifying multiplexing on mem_type

Signed-off-by: Jesper Dangaard Brouer <brouer@xxxxxxxxxx>

diff --git a/include/net/xdp.h b/include/net/xdp.h
index d1c5381fc95f..c02ac82a1d79 100644
--- a/include/net/xdp.h
+++ b/include/net/xdp.h
@@ -42,6 +42,7 @@ enum xdp_mem_type {
MEM_TYPE_PAGE_ORDER0, /* Orig XDP full page model */
MEM_TYPE_PAGE_POOL,
MEM_TYPE_XSK_BUFF_POOL,
+ MEM_TYPE_PP_NETMEM,
MEM_TYPE_MAX,
};

diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index d03448a4c411..68be76efef00 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -353,7 +353,7 @@ static void page_pool_set_pp_info(struct page_pool *pool,
struct page *page)
{
page->pp = pool;
- page->pp_magic |= PP_SIGNATURE;
+ page->pp_magic |= PP_SIGNATURE | (MEM_TYPE_PAGE_POOL << 8);
if (pool->p.init_callback)
pool->p.init_callback(page, pool->p.init_arg);
}
@@ -981,6 +981,7 @@ bool page_pool_return_skb_page(struct page *page, bool napi_safe)
struct napi_struct *napi;
struct page_pool *pp;
bool allow_direct;
+ int mem_type;

page = compound_head(page);

@@ -991,9 +992,10 @@ bool page_pool_return_skb_page(struct page *page, bool napi_safe)
* and page_is_pfmemalloc() is checked in __page_pool_put_page()
* to avoid recycling the pfmemalloc page.
*/
- if (unlikely((page->pp_magic & ~0x3UL) != PP_SIGNATURE))
+ if (unlikely((page->pp_magic & ~0xF03UL) != PP_SIGNATURE))
return false;

+ mem_type = (page->pp_magic & 0xF00) >> 8;
pp = page->pp;

/* Allow direct recycle if we have reasons to believe that we are
@@ -1009,7 +1011,10 @@ bool page_pool_return_skb_page(struct page *page, bool napi_safe)
* The page will be returned to the pool here regardless of the
* 'flipped' fragment being in use or not.
*/
- page_pool_put_full_page(pp, page, allow_direct);
+ if (mem_type == MEM_TYPE_PP_NETMEM)
+ pp_netmem_put_page(pp, page, allow_direct);
+ else
+ page_pool_put_full_page(pp, page, allow_direct);

return true;
}
diff --git a/net/core/xdp.c b/net/core/xdp.c
index 41e5ca8643ec..dc4bfbe8f002 100644
--- a/net/core/xdp.c
+++ b/net/core/xdp.c
@@ -380,6 +380,11 @@ void __xdp_return(void *data, struct xdp_mem_info *mem, bool napi_direct,
struct page *page;

switch (mem->type) {
+ case MEM_TYPE_PP_NETMEM:
+ if (napi_direct && xdp_return_frame_no_direct())
+ napi_direct = false;
+ pp_netmem_put(page->pp, data, napi_direct);
+ break;
case MEM_TYPE_PAGE_POOL:
page = virt_to_head_page(data);
if (napi_direct && xdp_return_frame_no_direct())