Re: [RFC] net: esp: fix bad handling of pages from page_pool

From: Dragos Tatulea
Date: Wed Mar 06 2024 - 11:02:20 EST


On Wed, 2024-03-06 at 07:22 -0800, Jakub Kicinski wrote:
> On Wed, 6 Mar 2024 13:05:14 +0000 Dragos Tatulea wrote:
> > On Tue, 2024-03-05 at 19:04 -0800, Jakub Kicinski wrote:
> > > On Mon, 4 Mar 2024 11:48:52 +0200 Dragos Tatulea wrote:
> > > > When the skb is reorganized during esp_output (!esp->inline), the pages
> > > > coming from the original skb fragments are supposed to be released back
> > > > to the system through put_page. But if the skb fragment pages are
> > > > originating from a page_pool, calling put_page on them will trigger a
> > > > page_pool leak which will eventually result in a crash.
> > >
> > > So it just does: skb_shinfo(skb)->nr_frags = 1;
> > > and assumes that's equivalent to owning a page ref on all the frags?
> > >
> > My understanding is different: it sets nr_frags to 1 because it's swapping out
> > the old page frag in fragment 0 with the new xfrag page frag and will use this
> > "new" skb from here. It does take a page reference for the xfrag page frag.
>
> Same understanding, I'm just bad at explaining :)
>
> > > Fix looks more or less good, we would need a new wrapper to avoid
> > > build issues without PAGE_POOL, 
> > >
> > Ack. Which component would be best location for this wrapper: page_pool?
>
> Hm, that's a judgment call.
> Part of me wants to put it next to napi_frag_unref(), since we
> basically need to factor out the insides of this function.
> When you post the patch the page pool crowd will give us
> their opinions.
>
Why not have napi_pp_put_page simply return false if CONFIG_PAGE_POOL is not
set?

Regarding stable would I need to send a separate fix that does the raw pp page
check without the API?

> > > but I wonder if we wouldn't be better
> > > off changing the other side. Instead of "cutting off" the frags -
> > > walking them and dealing with various page types. Because Mina and co.
> > > will step onto this landmine as well.
> > The page frags are still stored and used in the sg scatterlist. If we release
> > them at the moment when the skb is "cut off", the pages in the sg will be
> > invalid. At least that's my understanding.
>
> I was thinking something along the lines of:
>
> for each frag()
> if (is_pp_page()) {
> get_page();
> page_pool_unref_page(1);
> }
>
> so that it's trivial to insert another check for "is this a zero-copy"
> page in there, and error our. But on reflection the zero copy check may
> be better placed in __skb_to_sgvec(), so ignore this. Just respin
> what you got with a new helper.
>
Ignored. I was hoping we wouldn't go in that direction :).

Thanks,
Dragos