Re: [PATCH net-next v7 1/6] page_pool: frag API support for 32-bit arch with 64-bit DMA

From: Ilias Apalodimas
Date: Fri Aug 18 2023 - 02:13:41 EST


On Fri, 18 Aug 2023 at 02:57, Jakub Kicinski <kuba@xxxxxxxxxx> wrote:
>
> On Thu, 17 Aug 2023 19:59:37 +0300 Ilias Apalodimas wrote:
> > > Can we assume the DMA mapping of page pool is page aligned? We should
> > > be, right?
> >
> > Yes
> >
> > > That means we're storing 12 bits of 0 at the lower end.
> > > So even with 32b of space we can easily store addresses for 32b+12b =>
> > > 16TB of memory. "Ought to be enough" to paraphrase Bill G, and the
> > > problem is only in our heads?
> >
> > Do you mean moving the pp_frag_count there?
>
> Right, IIUC we don't have enough space to fit dma_addr_t and the
> refcount, but if we store the dma addr on a shifted u32 instead
> of using dma_addr_t explicitly - the refcount should fit?

struct page looks like this:

unsigned long dma_addr;
union {
unsigned long dma_addr_upper;
atomic_long_t pp_frag_count;
};

So, on 32bit platforms with 64bit dma we can't support a frag count at all.
We could either use the lower 12 bits (and have support for 4096 frags
'only') or do what you suggest.
TBH I don't love any of these and since those platforms are rare (or
at least that's what I think), I prefer not supporting them at all.

>
> > I was questioning the need to have PP_FLAG_PAGE_SPLIT_IN_DRIVER
> > overall. With Yunshengs patches such a platform would allocate a
> > page, so why should we prevent it from splitting it internally?
>
> Splitting it is fine, the problem is that the refcount AKA
> page->pp_frag_count** counts outstanding PP-aware references
> and page->refcount counts PP-unaware references.
>
> If we want to use page->refcount directly we'd need to unmap
> the page whenever drivers calls page_pool_defrag_page().
> But the driver may assume the page is still mapped afterwards.

What I am suggesting here is to not add the new
PP_FLAG_PAGE_SPLIT_IN_DRIVER flag. If a driver wants to split pages
internally it should create a pool without
PP_FLAG_DMA_SYNC_DEV to begin with. The only responsibility the
driver would have is to elevate the page refcnt so page pool would not
try to free/recycle it. Since it won't be able to allocate fragments
we don't have to worry about the rest.

> We can change the API to make this behavior explicit. Although
> IMHO that's putting the burden of rare platforms on non-rare
> platforms which we should avoid.

Yep, agree here.

>
> ** I said it before and I will keep saying this until someone gets
> angry at me - I really think we should rename this field because
> the association with frags is a coincidence.

I had similar confusions everytime I had to re-read our code hence git
show 4d4266e3fd32
Any suggestions?

Thanks
/Ilias