Re: Memory providers multiplexing (Was: [PATCH net-next v4 4/5] page_pool: remove PP_FLAG_PAGE_FRAG flag)

From: Jesper Dangaard Brouer
Date: Tue Jun 20 2023 - 11:16:19 EST




On 19/06/2023 20.07, Jakub Kicinski wrote:
On Fri, 16 Jun 2023 22:42:35 +0200 Jesper Dangaard Brouer wrote:
Former is better for huge pages, latter is better for IO mem
(peer-to-peer DMA). I wonder if you have different use case which
requires a different model :(

I want for the network stack SKBs (and XDP) to support different memory
types for the "head" frame and "data-frags". Eric have described this
idea before, that hardware will do header-split, and we/he can get TCP
data part is another page/frag, making it faster for TCP-streams, but
this can be used for much more.

My proposed use-cases involves more that TCP. We can easily imagine
NVMe protocol header-split, and the data-frag could be a mem_type that
actually belongs to the harddisk (maybe CPU cannot even read this). The
same scenario goes for GPU memory, which is for the AI use-case. IIRC
then Jonathan have previously send patches for the GPU use-case.

I really hope we can work in this direction together,

Perfect, that's also the use case I had in mind. The huge page thing
was just a quick thing to implement as a PoC (although useful in its
own right, one day I'll find the time to finish it, sigh).

That said I couldn't convince myself that for a peer-to-peer setup we
have enough space in struct page to store all the information we need.
Or that we'd get a struct page at all, and not just a region of memory
with no struct page * allocated :S

Good with big ideas, but I think we should start smaller and evolve.


That'd require serious surgery on the page pool's fast paths to work
around.

I haven't dug into the details, tho. If you think we can use page pool
as a frontend for iouring and/or p2p memory that'd be awesome!


Hmm... I don't like the sound of this.
My point is that we should create a more plug-able memory system for
netstack. And NOT try to extend page_pool to cover all use-cases.

The workaround solution I had in mind would be to create a narrower API
for just data pages. Since we'd need to sprinkle ifs anyway, pull them
up close to the call site. Allowing to switch page pool for a
completely different implementation, like the one Jonathan coded up for
iouring. Basically

$name_alloc_page(queue)
{
if (queue->pp)
return page_pool_dev_alloc_pages(queue->pp);
else if (queue->iouring..)
...
}

Yes, this is more the direction I'm thinking.
In many cases, you don't need this if-statement helper in the driver, as
driver RX side code will know the API used upfront.

The TX completion side will need this kind of multiplexing return
helper, to return the pages to the correct memory allocator type (e.g.
page_pool being one). See concept in [1] __xdp_return().

Performance wise, function pointers are slow due to RETPOLINE, but
switch-case statements (below certain size) becomes a jump table, which
is fast. See[1].

[1] https://elixir.bootlin.com/linux/v6.4-rc7/source/net/core/xdp.c#L377

Regarding room in "struct page", notice that page->pp_magic will have
plenty room for e.g. storing xdp_mem_type or even xdp_mem_info (which
also contains an ID).

--Jesper