Re: [PATCH net-next v3] net: dqs: add NIC stall detector based on BQL

From: Eric Dumazet
Date: Wed Feb 14 2024 - 11:59:42 EST


On Wed, Feb 14, 2024 at 5:49 PM Breno Leitao <leitao@xxxxxxxxxx> wrote:
>
> On Wed, Feb 14, 2024 at 04:41:36PM +0100, Eric Dumazet wrote:
> > On Wed, Feb 14, 2024 at 3:45 PM Breno Leitao <leitao@xxxxxxxxxx> wrote:
> > >
> > > On Tue, Feb 13, 2024 at 10:04:57AM -0800, Jakub Kicinski wrote:
> > > > On Tue, 13 Feb 2024 14:57:49 +0100 Eric Dumazet wrote:
> > > > > Please note that adding other sysfs entries is expensive for workloads
> > > > > creating/deleting netdev and netns often.
> > > > >
> > > > > I _think_ we should find a way for not creating
> > > > > /sys/class/net/<interface>/queues/tx-{Q}/byte_queue_limits directory
> > > > > and files
> > > > > for non BQL enabled devices (like loopback !)
> > > >
> > > > We should try, see if anyone screams. We could use IFF_NO_QUEUE, and
> > > > NETIF_F_LLTX as a proxy for "device doesn't have a real queue so BQL
> > > > would be pointless"? Obviously better to annotate the drivers which
> > > > do have BQL support, but there's >50 of them on a quick count..
> > >
> > > Let me make sure I understand the suggestion above. We want to disable
> > > BQL completely for devices that has dev->features & NETIF_F_LLTX or
> > > dev->priv_flags & IFF_NO_QUEUE, right?
> > >
> > > Maybe we can add a ->enabled field in struct dql, and set it according
> > > to the features above. Then we can created the sysfs and process the dql
> > > operations based on that field. This should avoid some unnecessary calls
> > > also, if we are not display sysfs.
> > >
> > > Here is a very simple PoC to represent what I had in mind. Am I in the
> > > right direction?
> >
> > No, this was really about sysfs entries (aka dql_group)
> >
> > Partial patch would be:
>
> That is simpler than what I imagined. Thanks!
>

>
> for netdev_uses_bql(), would it be similar to what I proposed in the
> previous message? Let me copy it here.
>
> static bool netdev_uses_bql(struct net_device *dev)
> {
> if (dev->features & NETIF_F_LLTX ||
> dev->priv_flags & IFF_NO_QUEUE)
> return false;
>
> return true;
> }

I think this should be fine, yes.