Re: [PATCH net-next v3] net: dqs: add NIC stall detector based on BQL

From: Breno Leitao
Date: Wed Feb 14 2024 - 11:49:45 EST


On Wed, Feb 14, 2024 at 04:41:36PM +0100, Eric Dumazet wrote:
> On Wed, Feb 14, 2024 at 3:45 PM Breno Leitao <leitao@xxxxxxxxxx> wrote:
> >
> > On Tue, Feb 13, 2024 at 10:04:57AM -0800, Jakub Kicinski wrote:
> > > On Tue, 13 Feb 2024 14:57:49 +0100 Eric Dumazet wrote:
> > > > Please note that adding other sysfs entries is expensive for workloads
> > > > creating/deleting netdev and netns often.
> > > >
> > > > I _think_ we should find a way for not creating
> > > > /sys/class/net/<interface>/queues/tx-{Q}/byte_queue_limits directory
> > > > and files
> > > > for non BQL enabled devices (like loopback !)
> > >
> > > We should try, see if anyone screams. We could use IFF_NO_QUEUE, and
> > > NETIF_F_LLTX as a proxy for "device doesn't have a real queue so BQL
> > > would be pointless"? Obviously better to annotate the drivers which
> > > do have BQL support, but there's >50 of them on a quick count..
> >
> > Let me make sure I understand the suggestion above. We want to disable
> > BQL completely for devices that has dev->features & NETIF_F_LLTX or
> > dev->priv_flags & IFF_NO_QUEUE, right?
> >
> > Maybe we can add a ->enabled field in struct dql, and set it according
> > to the features above. Then we can created the sysfs and process the dql
> > operations based on that field. This should avoid some unnecessary calls
> > also, if we are not display sysfs.
> >
> > Here is a very simple PoC to represent what I had in mind. Am I in the
> > right direction?
>
> No, this was really about sysfs entries (aka dql_group)
>
> Partial patch would be:

That is simpler than what I imagined. Thanks!

> diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
> index a09d507c5b03d24a829bf7af0b7cf1e6a0bdb65a..094e3b2d78cca40d810b2fa3bd4393d22b30e6ad
> 100644
> --- a/net/core/net-sysfs.c
> +++ b/net/core/net-sysfs.c
> @@ -1709,9 +1709,11 @@ static int netdev_queue_add_kobject(struct
> net_device *dev, int index)
> goto err;
>
> #ifdef CONFIG_BQL
> - error = sysfs_create_group(kobj, &dql_group);
> - if (error)
> - goto err;
> + if (netdev_uses_bql(dev)) {

for netdev_uses_bql(), would it be similar to what I proposed in the
previous message? Let me copy it here.

static bool netdev_uses_bql(struct net_device *dev)
{
if (dev->features & NETIF_F_LLTX ||
dev->priv_flags & IFF_NO_QUEUE)
return false;

return true;
}

Thanks