Re: [RFC PATCH net-next v5 2/2] net: add netmem to skb_frag_t

From: Mina Almasry
Date: Wed Jan 17 2024 - 13:34:31 EST


On Wed, Jan 17, 2024 at 10:00 AM Mina Almasry <almasrymina@xxxxxxxxxx> wrote:
>
> On Tue, Jan 16, 2024 at 4:16 AM Jason Gunthorpe <jgg@xxxxxxxxxx> wrote:
> >
> > On Tue, Jan 16, 2024 at 07:04:13PM +0800, Yunsheng Lin wrote:
> > > On 2024/1/16 8:01, Jason Gunthorpe wrote:
> > > > On Mon, Jan 15, 2024 at 03:23:33PM -0800, Mina Almasry wrote:
> > > >>>> You did not answer my question that I asked here, and ignoring this
> > > >>>> question is preventing us from making any forward progress on this
> > > >>>> discussion. What do you expect or want skb_frag_page() to do when
> > > >>>> there is no page in the frag?
> > > >>>
> > > >>> I would expect it to do nothing.
> > > >>
> > > >> I don't understand. skb_frag_page() with an empty implementation just
> > > >> results in a compiler error as the function needs to return a page
> > > >> pointer. Do you actually expect skb_frag_page() to unconditionally
> > > >> cast frag->netmem to a page pointer? That was explained as
> > > >> unacceptable over and over again by Jason and Christian as it risks
> > > >> casting devmem to page; completely unacceptable and will get nacked.
> > > >> Do you have a suggestion of what skb_frag_page() should do that will
> > > >> not get nacked by mm?
> > > >
> > > > WARN_ON and return NULL seems reasonable?
> > >
>
> That's more or less what I'm thinking.
>
> > > While I am agreed that it may be a nightmare to debug the case of passing
> > > a false page into the mm system, but I am not sure what's the point of
> > > returning NULL to caller if the caller is not expecting or handling
> > > the
> >
> > You have to return something and NULL will largely reliably crash the
> > thread. The WARN_ON explains in detail why your thread just crashed.
> >
>
> Agreed.
>
> > > NULL returning[for example, most of mm API called by the networking does not
> > > seems to handling NULL as input page], isn't the NULL returning will make
> > > the kernel panic anyway? Doesn't it make more sense to just add a BUG_ON()
> > > depending on some configuration like CONFIG_DEBUG_NET or CONFIG_DEVMEM?
> > > As returning NULL seems to be causing a confusion for the caller of
> > > skb_frag_page() as whether to or how to handle the NULL returning case.
> >
> > Possibly, though Linus doesn't like BUG_ON on principle..
> >
> > I think the bigger challenge is convincing people that this devmem
> > stuff doesn't just open a bunch of holes in the kernel where userspace
> > can crash it.
> >
>
> It does not, and as of right now there are no pending concerns from
> any netdev maintainers regarding mishandled devmem checks at least.
> This is because the devmem series comes with a full audit of
> skb_frag_page() callers [1] and all areas in the net stack attempting
> to access the skb [2].
>
> [1] https://patchwork.kernel.org/project/netdevbpf/patch/20231218024024.3516870-10-almasrymina@xxxxxxxxxx/
> [2] https://patchwork.kernel.org/project/netdevbpf/patch/20231218024024.3516870-11-almasrymina@xxxxxxxxxx/
>
> > The fact you all are debating what to do with skb_frag_page() suggests
> > to me there isn't confidence...
> >
>
> The debate raging on is related to the performance of skb_frag_page(),
> not correctness (and even then, I don't think it's related to
> perf...). Yunsheng would like us to optimize skb_frag_page() using an
> unconditional cast from netmem to page. This in Yunsheng's mind is a
> performance optimization as we don't need to add an if statement
> checking if the netmem is a page. I'm resistant to implement that
> change so far because:
>
> (a) unconditionally casting from netmem to page negates the compiler
> type safety that you and Christian are laying out as a requirement for
> the devmem stuff.
> (b) With likely/unlikely or static branches the check to make sure
> netmem is page is a no-op for existing use cases anyway, so AFAIU,
> there is no perf gain from optimizing it out anyway.
>

Another thought, if anyone else is concerned about the devmem checks
performance, potentially we could introduce CONFIG_NET_DEVMEM which
when disabled prevents the user from using devmem at all (disables the
netlink API).

When that is disabled, skb_frag_page(), netmem_to_page() & friends can
assume netmem is always a page and do a straight cast between netmem &
page. When it's enabled, it will check that netmem == page before
doing a cast, and return NULL if it is not a page.

I think this is technically viable and I think preserves the compiler
type safety requirements set by mm folks. From my POV though, bloating
the kernel with a a new CONFIG just to optimize out no-op checks seems
unnecessary, but if there is agreement that the checks are a concern,
adding CONFIG_NET_DEVMEM should address it while being acceptable to
mm maintainers.

> But none of this is related to correctness. Code calling
> skb_frag_page() will fail or crash if it's not handled correctly
> regardless of the implementation details of skb_frag_page(). In the
> devmem series we add support to handle it correctly via [1] & [2].
>
> --
> Thanks,
> Mina



--
Thanks,
Mina