Re: [PATCH net-next v3] net: dqs: add NIC stall detector based on BQL

From: Breno Leitao
Date: Wed Feb 14 2024 - 12:32:39 EST


On Wed, Feb 14, 2024 at 05:58:37PM +0100, Eric Dumazet wrote:
> On Wed, Feb 14, 2024 at 5:49 PM Breno Leitao <leitao@xxxxxxxxxx> wrote:
> >
> > On Wed, Feb 14, 2024 at 04:41:36PM +0100, Eric Dumazet wrote:
> > > On Wed, Feb 14, 2024 at 3:45 PM Breno Leitao <leitao@xxxxxxxxxx> wrote:
> > > >
> > > > On Tue, Feb 13, 2024 at 10:04:57AM -0800, Jakub Kicinski wrote:
> > > > > On Tue, 13 Feb 2024 14:57:49 +0100 Eric Dumazet wrote:
> > > > > > Please note that adding other sysfs entries is expensive for workloads
> > > > > > creating/deleting netdev and netns often.
> > > > > >
> > > > > > I _think_ we should find a way for not creating
> > > > > > /sys/class/net/<interface>/queues/tx-{Q}/byte_queue_limits directory
> > > > > > and files
> > > > > > for non BQL enabled devices (like loopback !)
> > > > >
> > > > > We should try, see if anyone screams. We could use IFF_NO_QUEUE, and
> > > > > NETIF_F_LLTX as a proxy for "device doesn't have a real queue so BQL
> > > > > would be pointless"? Obviously better to annotate the drivers which
> > > > > do have BQL support, but there's >50 of them on a quick count..
> > > >
> > > > Let me make sure I understand the suggestion above. We want to disable
> > > > BQL completely for devices that has dev->features & NETIF_F_LLTX or
> > > > dev->priv_flags & IFF_NO_QUEUE, right?
> > > >
> > > > Maybe we can add a ->enabled field in struct dql, and set it according
> > > > to the features above. Then we can created the sysfs and process the dql
> > > > operations based on that field. This should avoid some unnecessary calls
> > > > also, if we are not display sysfs.
> > > >
> > > > Here is a very simple PoC to represent what I had in mind. Am I in the
> > > > right direction?
> > >
> > > No, this was really about sysfs entries (aka dql_group)
> > >
> > > Partial patch would be:
> >
> > That is simpler than what I imagined. Thanks!
> >
>
> >
> > for netdev_uses_bql(), would it be similar to what I proposed in the
> > previous message? Let me copy it here.
> >
> > static bool netdev_uses_bql(struct net_device *dev)
> > {
> > if (dev->features & NETIF_F_LLTX ||
> > dev->priv_flags & IFF_NO_QUEUE)
> > return false;
> >
> > return true;
> > }
>
> I think this should be fine, yes.

Awesome, thanks.

I am planning to send this in separate from the "net: dqs: add NIC stall
detector based on BQL" patch since there isn't really a dependency here.