Re: [RFC PATCH net-next v5 2/2] net: add netmem to skb_frag_t

From: Yunsheng Lin
Date: Mon Jan 15 2024 - 04:55:24 EST


On 2024/1/12 23:35, Mina Almasry wrote:
> On Fri, Jan 12, 2024 at 3:51 AM Yunsheng Lin <linyunsheng@xxxxxxxxxx> wrote:
>>
>> On 2024/1/12 8:34, Mina Almasry wrote:
>>> On Thu, Jan 11, 2024 at 4:45 AM Yunsheng Lin <linyunsheng@xxxxxxxxxx> wrote:
>>>>
>>>> On 2024/1/9 9:14, Mina Almasry wrote:
>>>>
>>>> ...
>>>>
>>>>>
>>>>> + if (WARN_ON_ONCE(!skb_frag_page(&skb_shinfo(skb)->frags[0]))) {
>>>>
>>>> I am really hate to bring it up again.
>>>> If you are not willing to introduce a new helper,
>>>
>>> I'm actually more than happy to add a new helper like:
>>>
>>> static inline netmem_ref skb_frag_netmem();
>>>
>>> For future callers to obtain frag->netmem to use the netmem_ref directly.
>>>
>>> What I'm hung up on is really your follow up request:
>>>
>>> "Is it possible to introduce something like skb_frag_netmem() for
>>> netmem? so that we can keep most existing users of skb_frag_page()
>>> unchanged and avoid adding additional checking overhead for existing
>>> users."
>>>
>>> With this patchseries, skb_frag_t no longer has a page pointer inside
>>> of it, it only has a netmem_ref. The netmem_ref is currently always a
>>> page, but in the future may not be a page. Can you clarify how we keep
>>> skb_frag_page() unchanged and without checks? What do you expect
>>> skb_frag_page() and its callers to do? We can not assume netmem_ref is
>>> always a struct page. I'm happy to implement a change but I need to
>>> understand it a bit better.
>
>
> You did not answer my question that I asked here, and ignoring this
> question is preventing us from making any forward progress on this
> discussion. What do you expect or want skb_frag_page() to do when
> there is no page in the frag?

I would expect it to do nothing.
IMHO, the caller is expected to only call skb_frag_page() for the frag
with normal page, for example, doing some 'readable' checking before
callingskb_frag_page(), or not doing any checking at all if it is called
in some existing driver not support devmem.

>
>>
>> There are still many existing places still not expecting or handling
>> skb_frag_page() returning NULL, mostly those are in the drivers not
>> supporting devmem,
>
> As of this series skb_frag_page() cannot return NULL.
>
> In the devmem series, all core networking stack places where
> skb_frag_page() may return NULL are audited.
>
> skb_frag_page() returning NULL in a driver that doesn't support devmem
> is not possible. The driver is the one that creates the devmem frags

Is it possible a netdev supporting devmen and a netdev not supporting
devmen are put into the same bridge, and a rx skb from netdev supporting
devmen is forwarded to netdev not supporting devmem?

br_forward() or ip_forward() may be the place that might do this kind
of forwarding?

> in the first place. When the driver author adds devmem support, they
> should also add support for skb_frag_page() returning NULL.
>
>> what's the point of adding the extral overhead for
>> those driver?
>>
>
> There is no overhead with static branches. The checks are no-op unless
> the user enables devmem, at which point the devmem connections see no

no-op is still some cpu instruction that will be replaced by some jumping
instruction when devmem is enabled, right?

Maybe Alexander had better words for those kinds of overhead:
"The overhead may not seem like much, but when you are having to deal
with it per page and you are processing pages at millions per second it
can quickly start to add up."


> overhead and non-devmem connections will see minimal overhead that I
> suspect will not reproduce any perf issue. If the user is not fine
> with that they can simply not enable devmem and continue to not
> experience any overhead.
>
>> The networking stack should forbid skb with devmem frag being forwarded
>> to drivers not supporting devmem yet. I am sure if this is done properly
>> in your patchset yet? if not, I think you might to audit every existing
>> drivers handling skb_frag_page() returning NULL correctly.
>>
>
> There is no audit required. The devmem frags are generated by the
> driver and forwarded to the TCP stack, not the other way around.
>
>>
>>>