Re: [PATCH] net: sched: print jiffies when transmit queue time out

From: Yajun Deng
Date: Wed Apr 19 2023 - 22:17:32 EST


April 20, 2023 9:27 AM, "Jakub Kicinski" <kuba@xxxxxxxxxx> wrote:

> On Wed, 19 Apr 2023 19:56:32 +0800 Yajun Deng wrote:
>
>> Although there is watchdog_timeo to let users know when the transmit queue
>> begin stall, but dev_watchdog() is called with an interval. The jiffies
>> will always be greater than watchdog_timeo.
>>
>> To let users know the exact time the stall started, print jiffies when
>> the transmit queue time out.
>
> Please add an explanation of how this information is useful in practice.

We found some cases with several warnings. We want to confirm which happened first.

First warning:
16:37:57 kernel: [ 7100.097547] ------------[ cut here ]------------
16:37:57 kernel: [ 7100.097550] NETDEV WATCHDOG: eno2 (i40e): transmit queue 8 timed out
16:37:57 kernel: [ 7100.097571] WARNING: CPU: 8 PID: 0 at net/sched/sch_generic.c:467 dev_watchdog+0x260/0x270
...

Second warning:
16:38:44 kernel: [ 7147.756952] rcu: INFO: rcu_preempt self-detected stall on CPU
16:38:44 kernel: [ 7147.756958] rcu: 24-....: (59999 ticks this GP) idle=546/1/0x4000000000000000 softirq=367 3137/3673146 fqs=13844
16:38:44 kernel: [ 7147.756960] (t=60001 jiffies g=4322709 q=133381)
16:38:44 kernel: [ 7147.756962] NMI backtrace for cpu 24
...

As we can see, the transmit queue start stall should be before 16:37:52, the rcu start stall is 16:37:44.
These two times are closer, we want to confirm which happened first.