Re: [PATCH net-next] sfc: reduce the number of requested xdp ev queues

From: Jesper Dangaard Brouer
Date: Wed Dec 16 2020 - 03:47:19 EST


On Tue, 15 Dec 2020 18:49:55 +0000
Edward Cree <ecree.xilinx@xxxxxxxxx> wrote:

> On 15/12/2020 09:43, Jesper Dangaard Brouer wrote:
> > On Mon, 14 Dec 2020 17:29:06 -0800
> > Ivan Babrou <ivan@xxxxxxxxxxxxxx> wrote:
> >
> >> Without this change the driver tries to allocate too many queues,
> >> breaching the number of available msi-x interrupts on machines
> >> with many logical cpus and default adapter settings:
> >>
> >> Insufficient resources for 12 XDP event queues (24 other channels, max 32)
> >>
> >> Which in turn triggers EINVAL on XDP processing:
> >>
> >> sfc 0000:86:00.0 ext0: XDP TX failed (-22)
> >
> > I have a similar QA report with XDP_REDIRECT:
> > sfc 0000:05:00.0 ens1f0np0: XDP redirect failed (-22)
> >
> > Here we are back to the issue we discussed with ixgbe, that NIC / msi-x
> > interrupts hardware resources are not enough on machines with many
> > logical cpus.
> >
> > After this fix, what will happen if (cpu >= efx->xdp_tx_queue_count) ?
>
> Same as happened before: the "failed -22". But this fix will make that
> less likely to happen, because it ties more TXQs to each EVQ, and it's
> the EVQs that are in short supply.
>

So, what I hear is that this fix is just pampering over the real issue.

I suggest that you/we detect the situation, and have a code path that
will take a lock (per 16 packets bulk) and solve the issue.

If you care about maximum performance you can implement this via
changing the ndo_xdp_xmit pointer to the fallback function when needed,
to avoid having a to check for the fallback mode in the fast-path.

>
> (Strictly speaking, I believe the limitation is a software one, that
> comes from the driver's channel structures having been designed a
> decade ago when 32 cpus ought to be enough for anybody... AFAIR the
> hardware is capable of giving us something like 1024 evqs if we ask
> for them, it just might not have that many msi-x vectors for us.)
> Anyway, the patch looks correct, so
> Acked-by: Edward Cree <ecree.xilinx@xxxxxxxxx>

--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Principal Kernel Engineer at Red Hat
LinkedIn: http://www.linkedin.com/in/brouer