Re: Memory providers multiplexing (Was: [PATCH net-next v4 4/5] page_pool: remove PP_FLAG_PAGE_FRAG flag)

From: Jason Gunthorpe
Date: Mon Jul 03 2023 - 17:43:23 EST


On Sun, Jul 02, 2023 at 11:22:33PM -0700, Mina Almasry wrote:
> On Sun, Jul 2, 2023 at 9:20 PM David Ahern <dsahern@xxxxxxxxxx> wrote:
> >
> > On 6/29/23 8:27 PM, Mina Almasry wrote:
> > >
> > > Hello Jakub, I'm looking into device memory (peer-to-peer) networking
> > > actually, and I plan to pursue using the page pool as a front end.
> > >
> > > Quick description of what I have so far:
> > > current implementation uses device memory with struct pages; I am
> > > putting all those pages in a gen_pool, and we have written an
> > > allocator that allocates pages from the gen_pool. In the driver, we
> > > use this allocator instead of alloc_page() (the driver in question is
> > > gve which currently doesn't use the page pool). When the driver is
> > > done with the p2p page, it simply decrements the refcount on it and
> > > the page is freed back to the gen_pool.
>
> Quick update here, I was able to get my implementation working with
> the page pool as a front end with the memory provider API Jakub wrote
> here:
> https://github.com/kuba-moo/linux/tree/pp-providers
>
> The main complication indeed was the fact that my device memory pages
> are ZONE_DEVICE pages, which are incompatible with the page_pool due
> to the union in struct page. I thought of a couple of approaches to
> resolve that.
>
> 1. Make my device memory pages non-ZONE_DEVICE pages.

Hard no on this from a mm perspective.. We need P2P memory to be
properly tagged and have the expected struct pages to be DMA mappable
and otherwise, you totally break everything if you try to do this..

> 2. Convert the pages from ZONE_DEVICE pages to page_pool pages and
> vice versa as they're being inserted and removed from the page pool.

This is kind of scary, it is very, very, fragile to rework the pages
like this. Eg what happens when the owning device unplugs and needs to
revoke these pages? I think it would likely crash..

I think it also technically breaks the DMA API as we may need to look
into the pgmap to do cache ops on some architectures.

I suggest you try to work with 8k folios and then the tail page's
struct page is empty enough to store the information you need..
Or allocate per page memory and do a memdesc like thing..

Though overall, you won't find devices creating struct pages for their
P2P memory today, so I'm not sure what the purpose is. Jonathan
already got highly slammed for proposing code to the kernel that was
unusable. Please don't repeat that. Other than a special NVMe use case
the interface for P2P is DMABUF right now and it is not struct page
backed.

Even if we did get to struct pages for device memory, it is highly
likely cases you are interested in will be using larger than 4k
folios, so page pool would need to cope with this nicely as well.

Jason