Re: [PATCH net-next 3/6] mm/page_alloc: use initial zero offset for page_frag_alloc_align()

From: Alexander Duyck
Date: Mon Jan 08 2024 - 11:26:17 EST


On Mon, Jan 8, 2024 at 12:59 AM Yunsheng Lin <linyunsheng@xxxxxxxxxx> wrote:
>
> On 2024/1/5 23:42, Alexander H Duyck wrote:
> > On Wed, 2024-01-03 at 17:56 +0800, Yunsheng Lin wrote:
> >> The next patch is above to use page_frag_alloc_align() to
> >> replace vhost_net_page_frag_refill(), the main difference
> >> between those two frag page implementations is whether we
> >> use a initial zero offset or not.
> >>
> >> It seems more nature to use a initial zero offset, as it
> >> may enable more correct cache prefetching and skb frag
> >> coalescing in the networking, so change it to use initial
> >> zero offset.
> >>
> >> Signed-off-by: Yunsheng Lin <linyunsheng@xxxxxxxxxx>
> >> CC: Alexander Duyck <alexander.duyck@xxxxxxxxx>
> >
> > There are several advantages to running the offset as a countdown
> > rather than count-up value.
> >
> > 1. Specifically for the size of the chunks we are allocating doing it
> > from the bottom up doesn't add any value as we are jumping in large
> > enough amounts and are being used for DMA so being sequential doesn't
> > add any value.
>
> What is the expected size of the above chunks in your mind? I suppose
> that is like NAPI_HAS_SMALL_PAGE_FRAG to avoid excessive truesize
> underestimation?
>
> It seems there is no limit for min size of allocation for
> page_frag_alloc_align() now, and as the page_frag API seems to be only
> used in networking, should we enforce the min size of allocation at API
> level?

The primary use case for this is to allocate fragments to be used for
storing network data. We usually end up allocating a minimum of 1K in
most cases as you end up having to reserve something like 512B for
headroom and the skb_shared_info in an skb. In addition the slice
lengths become very hard to predict as these are usually used for
network traffic so the size can be as small as 60B for a packet or as
large as 9KB

> >
> > 2. By starting at the end and working toward zero we can use built in
> > functionality of the CPU to only have to check and see if our result
> > would be signed rather than having to load two registers with the
> > values and then compare them which saves us a few cycles. In addition
> > it saves us from having to read both the size and the offset for every
> > page.
>
> I suppose the above is ok if we only use the page_frag_alloc*() API to
> allocate memory for skb->data, not for the frag in skb_shinfo(), as by
> starting at the end and working toward zero, it means we can not do skb
> coalescing.
>
> As page_frag_alloc*() is returning va now, I am assuming most of users
> is using the API for skb->data, I guess it is ok to drop this patch for
> now.
>
> If we allow page_frag_alloc*() to return struct page, we might need this
> patch to enable coalescing.

I would argue this is not the interface for enabling coalescing. This
is one of the reasons why this is implemented the way it is. When you
are aligning fragments you aren't going to be able to coalesce the
frames anyway as the alignment would push the fragments apart.