Re: [PATCH net-next v3 09/12] iavf: switch to Page Pool

From: Alexander Duyck
Date: Fri Jun 02 2023 - 14:00:48 EST


On Fri, Jun 2, 2023 at 9:31 AM Alexander Lobakin
<aleksander.lobakin@xxxxxxxxx> wrote:
>
> From: Alexander H Duyck <alexander.duyck@xxxxxxxxx>
> Date: Wed, 31 May 2023 09:19:06 -0700
>
> > On Tue, 2023-05-30 at 17:00 +0200, Alexander Lobakin wrote:
> >> Now that the IAVF driver simply uses dev_alloc_page() + free_page() with
> >> no custom recycling logics and one whole page per frame, it can easily
> >> be switched to using Page Pool API instead.
>
> [...]
>
> >> @@ -691,8 +690,6 @@ int iavf_setup_tx_descriptors(struct iavf_ring *tx_ring)
> >> **/
> >> void iavf_clean_rx_ring(struct iavf_ring *rx_ring)
> >> {
> >> - u16 i;
> >> -
> >> /* ring already cleared, nothing to do */
> >> if (!rx_ring->rx_pages)
> >> return;
> >> @@ -703,28 +700,17 @@ void iavf_clean_rx_ring(struct iavf_ring *rx_ring)
> >> }
> >>
> >> /* Free all the Rx ring sk_buffs */
> >> - for (i = 0; i < rx_ring->count; i++) {
> >> + for (u32 i = 0; i < rx_ring->count; i++) {
> >
> > Did we make a change to our coding style to allow declaration of
> > variables inside of for statements? Just wondering if this is a change
> > since the recent updates to the ISO C standard, or if this doesn't
> > match up with what we would expect per the coding standard.
>
> It's optional right now, nobody would object declaring it either way.
> Doing it inside is allowed since we switched to C11, right.
> Here I did that because my heart was breaking to see this little u16
> alone (and yeah, u16 on the stack).

Yeah, that was back when I was declaring stack variables the exact
same size as the ring parameters. So u16 should match the size of
rx_ring->count not that it matters. It was just a quirk I had at the
time.

> >
> >> struct page *page = rx_ring->rx_pages[i];
> >> - dma_addr_t dma;
> >>
> >> if (!page)
> >> continue;
> >>
> >> - dma = page_pool_get_dma_addr(page);
> >> -
> >> /* Invalidate cache lines that may have been written to by
> >> * device so that we avoid corrupting memory.
> >> */
> >> - dma_sync_single_range_for_cpu(rx_ring->dev, dma,
> >> - LIBIE_SKB_HEADROOM,
> >> - LIBIE_RX_BUF_LEN,
> >> - DMA_FROM_DEVICE);
> >> -
> >> - /* free resources associated with mapping */
> >> - dma_unmap_page_attrs(rx_ring->dev, dma, LIBIE_RX_TRUESIZE,
> >> - DMA_FROM_DEVICE, IAVF_RX_DMA_ATTR);
> >> -
> >> - __free_page(page);
> >> + page_pool_dma_sync_full_for_cpu(rx_ring->pool, page);
> >> + page_pool_put_full_page(rx_ring->pool, page, false);
> >> }
> >>
> >> rx_ring->next_to_clean = 0;
> >> @@ -739,10 +725,15 @@ void iavf_clean_rx_ring(struct iavf_ring *rx_ring)
> >> **/
> >> void iavf_free_rx_resources(struct iavf_ring *rx_ring)
> >> {
> >> + struct device *dev = rx_ring->pool->p.dev;
> >> +
> >> iavf_clean_rx_ring(rx_ring);
> >> kfree(rx_ring->rx_pages);
> >> rx_ring->rx_pages = NULL;
> >>
> >> + page_pool_destroy(rx_ring->pool);
> >> + rx_ring->dev = dev;
> >> +
> >> if (rx_ring->desc) {
> >> dma_free_coherent(rx_ring->dev, rx_ring->size,
> >> rx_ring->desc, rx_ring->dma);
> >
> > Not a fan of this switching back and forth between being a page pool
> > pointer and a dev pointer. Seems problematic as it is easily
> > misinterpreted. I would say that at a minimum stick to either it is
> > page_pool(Rx) or dev(Tx) on a ring type basis.
>
> The problem is that page_pool has lifetime from ifup to ifdown, while
> its ring lives longer. So I had to do something with this, but also I
> didn't want to have 2 pointers at the same time since it's redundant and
> +8 bytes to the ring for nothing.

It might be better to just go with NULL rather than populating it w/
two different possible values. Then at least you know if it is an
rx_ring it is a page_pool and if it is a tx_ring it is dev. You can
reset to the page pool when you repopulate the rest of the ring.

> > This setup works for iavf, however for i40e/ice you may run into issues
> > since the setup_rx_descriptors call is also used to setup the ethtool
> > loopback test w/o a napi struct as I recall so there may not be a
> > q_vector.
>
> I'll handle that. Somehow :D Thanks for noticing, I'll take a look
> whether I should do something right now or it can be done later when
> switching the actual mentioned drivers.
>
> [...]
>
> >> @@ -240,7 +237,10 @@ struct iavf_rx_queue_stats {
> >> struct iavf_ring {
> >> struct iavf_ring *next; /* pointer to next ring in q_vector */
> >> void *desc; /* Descriptor ring memory */
> >> - struct device *dev; /* Used for DMA mapping */
> >> + union {
> >> + struct page_pool *pool; /* Used for Rx page management */
> >> + struct device *dev; /* Used for DMA mapping on Tx */
> >> + };
> >> struct net_device *netdev; /* netdev ring maps to */
> >> union {
> >> struct iavf_tx_buffer *tx_bi;
> >
> > Would it make more sense to have the page pool in the q_vector rather
> > than the ring? Essentially the page pool is associated per napi
> > instance so it seems like it would make more sense to store it with the
> > napi struct rather than potentially have multiple instances per napi.
>
> As per Page Pool design, you should have it per ring. Plus you have
> rxq_info (XDP-related structure), which is also per-ring and
> participates in recycling in some cases. So I wouldn't complicate.
> I went down the chain and haven't found any place where having more than
> 1 PP per NAPI would break anything. If I got it correctly, Jakub's
> optimization discourages having 1 PP per several NAPIs (or scheduling
> one NAPI on different CPUs), but not the other way around. The goal was
> to exclude concurrent access to one PP from different threads, and here
> it's impossible.

The xdp_rxq can be mapped many:1 to the page pool if I am not mistaken.

The only reason why I am a fan of trying to keep the page_pool tightly
associated with the napi instance is because the napi instance is what
essentially is guaranteeing the page_pool is consistent as it is only
accessed by that one napi instance.

> Lemme know. I can always disable NAPI optimization for cases when one
> vector is shared by several queues -- and it's not a usual case for
> these NICs anyway -- but I haven't found a reason for that.

I suppose we should be fine if we have a many to one mapping though I
suppose. As you said the issue would be if multiple NAPI were
accessing the same page pool.