Re: [PATCH V3,net-next] net: mana: Add page pool for RX buffers

From: Jesper Dangaard Brouer
Date: Tue Jul 25 2023 - 14:02:10 EST




On 24/07/2023 20.35, Haiyang Zhang wrote:

[...]
On 21/07/2023 21.05, Haiyang Zhang wrote:
Add page pool for RX buffers for faster buffer cycle and reduce CPU
usage.

The standard page pool API is used.

Signed-off-by: Haiyang Zhang <haiyangz@xxxxxxxxxxxxx>
---
V3:
Update xdp mem model, pool param, alloc as suggested by Jakub Kicinski
V2:
Use the standard page pool API as suggested by Jesper Dangaard Brouer

---
drivers/net/ethernet/microsoft/mana/mana_en.c | 91 +++++++++++++++--
--
include/net/mana/mana.h | 3 +
2 files changed, 78 insertions(+), 16 deletions(-)

diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c
b/drivers/net/ethernet/microsoft/mana/mana_en.c
index a499e460594b..4307f25f8c7a 100644
--- a/drivers/net/ethernet/microsoft/mana/mana_en.c
+++ b/drivers/net/ethernet/microsoft/mana/mana_en.c
[...]
@@ -1659,6 +1679,8 @@ static void mana_poll_rx_cq(struct mana_cq *cq)

if (rxq->xdp_flush)
xdp_do_flush();
+
+ page_pool_nid_changed(rxq->page_pool, numa_mem_id());

I don't think this page_pool_nid_changed() called is needed, if you do
as I suggest below (nid = NUMA_NO_NODE).


}

static int mana_cq_handler(void *context, struct gdma_queue
*gdma_queue)
[...]

@@ -2008,6 +2041,25 @@ static int mana_push_wqe(struct mana_rxq
*rxq)
return 0;
}

+static int mana_create_page_pool(struct mana_rxq *rxq)
+{
+ struct page_pool_params pprm = {};

You are implicitly assigning NUMA node id zero.

+ int ret;
+
+ pprm.pool_size = RX_BUFFERS_PER_QUEUE;
+ pprm.napi = &rxq->rx_cq.napi;

You likely want to assign pprm.nid to NUMA_NO_NODE

pprm.nid = NUMA_NO_NODE;

For most drivers it is recommended to assign ``NUMA_NO_NODE`` (value -1)
as the NUMA ID to ``pp_params.nid``. When ``CONFIG_NUMA`` is enabled
this setting will automatically select the (preferred) NUMA node (via
``numa_mem_id()``) based on where NAPI RX-processing is currently
running. The effect is that page_pool will only use recycled memory when
NUMA node match running CPU. This assumes CPU refilling driver RX-ring
will also run RX-NAPI.

If a driver want more control over the NUMA node memory selection,
drivers can assign (``pp_params.nid``) something else than
`NUMA_NO_NODE`` and runtime adjust via function
``page_pool_nid_changed()``.

Our driver is using NUMA 0 by default, so I implicitly assign NUMA node id
to zero during pool init.

And, if the IRQ/CPU affinity is changed, the page_pool_nid_changed()
will update the nid for the pool. Does this sound good?


Also, since our driver is getting the default node from here:
gc->numa_node = dev_to_node(&pdev->dev);
I will update this patch to set the default node as above, instead of implicitly
assigning it to 0.


In that case, I agree that it make sense to use dev_to_node(&pdev->dev), like:
pprm.nid = dev_to_node(&pdev->dev);

Driver must have a reason for assigning gc->numa_node for this hardware,
which is okay. That is why page_pool API allows driver to control this.

But then I don't think you should call page_pool_nid_changed() like

page_pool_nid_changed(rxq->page_pool, numa_mem_id());

Because then you will (at first packet processing event) revert the
dev_to_node() setting to use numa_mem_id() of processing/running CPU.
(In effect this will be the same as setting NUMA_NO_NODE).

I know, mlx5 do call page_pool_nid_changed(), but they showed benchmark
numbers that this was preferred action, even-when sysadm had
"misconfigured" the default smp_affinity RX-processing to happen on a
remote NUMA node. AFAIK mlx5 keeps the descriptor rings on the
originally configured NUMA node that corresponds to the NIC PCIe slot.

--Jesper