RE: [PATCH] xsk: Use pool->dma_pages to check for DMA

From: John Fastabend
Date: Mon Apr 24 2023 - 15:07:04 EST


Kal Conley wrote:
> Compare pool->dma_pages instead of pool->dma_pages_cnt to check for an
> active DMA mapping. pool->dma_pages needs to be read anyway to access
> the map so this compiles to more efficient code.

Was it noticable in some sort of performance test?

>
> Signed-off-by: Kal Conley <kal.conley@xxxxxxxxxxx>
> Acked-by: Magnus Karlsson <magnus.karlsson@xxxxxxxxx>
> ---
> include/net/xsk_buff_pool.h | 2 +-
> net/xdp/xsk_buff_pool.c | 7 ++++---
> 2 files changed, 5 insertions(+), 4 deletions(-)
>
> diff --git a/include/net/xsk_buff_pool.h b/include/net/xsk_buff_pool.h
> index d318c769b445..a8d7b8a3688a 100644
> --- a/include/net/xsk_buff_pool.h
> +++ b/include/net/xsk_buff_pool.h
> @@ -180,7 +180,7 @@ static inline bool xp_desc_crosses_non_contig_pg(struct xsk_buff_pool *pool,
> if (likely(!cross_pg))
> return false;
>
> - return pool->dma_pages_cnt &&
> + return pool->dma_pages &&
> !(pool->dma_pages[addr >> PAGE_SHIFT] & XSK_NEXT_PG_CONTIG_MASK);
> }
>
> diff --git a/net/xdp/xsk_buff_pool.c b/net/xdp/xsk_buff_pool.c
> index b2df1e0f8153..26f6d304451e 100644
> --- a/net/xdp/xsk_buff_pool.c
> +++ b/net/xdp/xsk_buff_pool.c
> @@ -350,7 +350,7 @@ void xp_dma_unmap(struct xsk_buff_pool *pool, unsigned long attrs)
> {
> struct xsk_dma_map *dma_map;
>
> - if (pool->dma_pages_cnt == 0)
> + if (!pool->dma_pages)
> return;

This seems to be used in the setup/tear-down paths so your optimizing
a control side. Is there a fast path with this code? I walked the
ice driver. If its just setup code we should do whatever is more
readable.

>
> dma_map = xp_find_dma_map(pool);
> @@ -364,6 +364,7 @@ void xp_dma_unmap(struct xsk_buff_pool *pool, unsigned long attrs)
>
> __xp_dma_unmap(dma_map, attrs);
> kvfree(pool->dma_pages);
> + pool->dma_pages = NULL;
> pool->dma_pages_cnt = 0;
> pool->dev = NULL;
> }
> @@ -503,7 +504,7 @@ static struct xdp_buff_xsk *__xp_alloc(struct xsk_buff_pool *pool)
> if (pool->unaligned) {
> xskb = pool->free_heads[--pool->free_heads_cnt];
> xp_init_xskb_addr(xskb, pool, addr);
> - if (pool->dma_pages_cnt)
> + if (pool->dma_pages)
> xp_init_xskb_dma(xskb, pool, pool->dma_pages, addr);
> } else {
> xskb = &pool->heads[xp_aligned_extract_idx(pool, addr)];
> @@ -569,7 +570,7 @@ static u32 xp_alloc_new_from_fq(struct xsk_buff_pool *pool, struct xdp_buff **xd
> if (pool->unaligned) {
> xskb = pool->free_heads[--pool->free_heads_cnt];
> xp_init_xskb_addr(xskb, pool, addr);
> - if (pool->dma_pages_cnt)
> + if (pool->dma_pages)
> xp_init_xskb_dma(xskb, pool, pool->dma_pages, addr);

Both the _alloc_ cases read neighboring free_heads_cnt so your saving a load I guess?
This is so deep into micro-optimizing I'm curious if you could measure it?

> } else {
> xskb = &pool->heads[xp_aligned_extract_idx(pool, addr)];

I'm not actually against optimizing but maybe another idea. Why do we have to
check at all? Seems if the DMA has been disabled/unmapped the driver shouldn't
be trying to call xsk_buff_alloc_batch? Then you can just drop the 'if' check.

It feels to me the drivers shouldn't even be calling this after unmapping
the dma. WDYT?