Re: [PATCH net-next v3 3/4] page_pool: introduce page_pool_alloc() API

From: Yunsheng Lin
Date: Thu Jun 15 2023 - 02:40:44 EST


On 2023/6/14 22:18, Alexander Duyck wrote:
> On Tue, Jun 13, 2023 at 8:51 PM Yunsheng Lin <linyunsheng@xxxxxxxxxx> wrote:
>>
>> On 2023/6/13 22:36, Alexander Duyck wrote:
>>> On Fri, Jun 9, 2023 at 6:20 AM Yunsheng Lin <linyunsheng@xxxxxxxxxx> wrote:
>>
>> ...
>>
>>>>
>>>> +static inline struct page *page_pool_alloc(struct page_pool *pool,
>>>> + unsigned int *offset,
>>>> + unsigned int *size, gfp_t gfp)
>>>> +{
>>>> + unsigned int max_size = PAGE_SIZE << pool->p.order;
>>>> + struct page *page;
>>>> +
>>>> + *size = ALIGN(*size, dma_get_cache_alignment());
>>>> +
>>>> + if (WARN_ON(*size > max_size))
>>>> + return NULL;
>>>> +
>>>> + if ((*size << 1) > max_size || PAGE_POOL_DMA_USE_PP_FRAG_COUNT) {
>>>> + *size = max_size;
>>>> + *offset = 0;
>>>> + return page_pool_alloc_pages(pool, gfp);
>>>> + }
>>>> +
>>>> + page = __page_pool_alloc_frag(pool, offset, *size, gfp);
>>>> + if (unlikely(!page))
>>>> + return NULL;
>>>> +
>>>> + /* There is very likely not enough space for another frag, so append the
>>>> + * remaining size to the current frag to avoid truesize underestimate
>>>> + * problem.
>>>> + */
>>>> + if (pool->frag_offset + *size > max_size) {
>>>> + *size = max_size - *offset;
>>>> + pool->frag_offset = max_size;
>>>> + }
>>>> +
>>>
>>> Rather than preventing a truesize underestimation this will cause one.
>>> You are adding memory to the size of the page reserved and not
>>> accounting for it anywhere as this isn't reported up to the network
>>> stack. I would suggest dropping this from your patch.
>>
>> I was thinking about the driver author reporting it up to the network
>> stack using the new API as something like below:
>>
>> int truesize = size;
>> struct page *page;
>> int offset;
>>
>> page = page_pool_dev_alloc(pool, &offset, &truesize);
>> if (unlikely(!page))
>> goto err;
>>
>> skb_add_rx_frag(skb, skb_shinfo(skb)->nr_frags, page,
>> offset, size, truesize);
>>
>> and similiar handling for *_build_skb() case too.
>>
>> Does it make senses for that? or did I miss something obvious here?
>
> It is more the fact that you are creating a solution in search of a
> problem. As I said before most of the drivers will deal with these
> sorts of issues by just doing the fragmentation themselves or
> allocating fixed size frags and knowing how it will be divided into
> the page.

It seems that there are already some drivers which using the page pool
API with different frag size for almost every calling, the virtio_net
and veth are the obvious ones.

When reviewing the page frag support for virtio_net, I found that it
was manipulating the page_pool->frag_offset directly to do something
as this patch does, see:

https://lore.kernel.org/lkml/CAKhg4tL9PrUebqQHL+s7A6-xqNnju3erNQejMr7UFjwTaOduZw@xxxxxxxxxxxxxx/

I am not sure we are both agreed that drivers should not be manipulating
the page_pool->frag_offset directly unless it is really necessary?

For the specific case for virtio_net, it seems we have the below options:
1. both the driver and page pool do not handle it.
2. the driver handles it by manipulating the page_pool->frag_offset
directly.
3. the page pool handles it as this patch does.

Is there any other options I missed for the specific case for virtio_net?
What is your perfer option? And why?

>
> If you are going to go down this path then you should have a consumer
> for the API and fully implement it instead of taking half measures and
> making truesize underreporting worse by evicting pages earlier.

I am not sure I understand what do you mean by "a consumer for the API",
Do you mean adding a new API something like page_pool_free() to do
something ligthweight, such as decrementing the frag user and adjusting
the frag_offset, which is corresponding to the page_pool_alloc() API
introduced in this patch?
If yes, I was considering about that before, but I am not sure it worth
the effort, as for most usecase, it is a very rare case for error handling
as my understanding.

I just note that we already have page_pool_free() used by the page pool
destroy process,we might need to do something to avoid the confusion
between page_pool_alloc() and page_pool_free() :(

>
> .
>