RE: [PATCH v4 4/4] rcu: Add RCU stall diagnosis information

From: Elliott, Robert (Servers)
Date: Tue Nov 08 2022 - 00:53:42 EST




> -----Original Message-----
> From: Leizhen (ThunderTown) <thunder.leizhen@xxxxxxxxxx>
> Sent: Monday, November 7, 2022 9:07 PM
> Subject: Re: [PATCH v4 4/4] rcu: Add RCU stall diagnosis information
>
> On 2022/11/8 5:57, Elliott, Robert (Servers) wrote:
> > I created a 22 second stall, which triggered two self-detected stall
> > messages. The second one covers 18 seconds (and reports 17444 ms
> > of system cputime), but still reports the half_timeout of 2.5 s on
>
> Because I have not updated rsrp->gp_seq in print_cpu_stat_info().
>
> Please add rsrp->gp_seq-- at the bottom of print_cpu_stat_info()
> and try again.
>
> diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h
> index 2e560a70d88fd87..6f6c95d599e6436 100644
> --- a/kernel/rcu/tree_stall.h
> +++ b/kernel/rcu/tree_stall.h
> @@ -455,6 +455,7 @@ static void print_cpu_stat_info(int cpu)
> div_u64(cpustat[CPUTIME_SOFTIRQ] - rsrp->cputime_softirq,
> NSEC_PER_MSEC),
> div_u64(cpustat[CPUTIME_SYSTEM] - rsrp->cputime_system,
> NSEC_PER_MSEC),
> jiffies64_to_msecs(half_timeout));
> + rsrp->gp_seq--;
> }


That causes a lot more expedited stall messages to be printed (52 of them).
[ 80.709572] rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { 22-.... } 3 jiffies s: 481 root: 0x2/.
[ 80.743724] rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { 22-.... } 311 jiffies s: 481 root: 0x2/.
[ 81.028028] rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { 22-.... } 620 jiffies s: 481 root: 0x2/.
[ 81.313255] rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { 22-.... } 929 jiffies s: 481 root: 0x2/.
[ 81.598483] rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { 22-.... } 1238 jiffies s: 481 root: 0x2/.
[ 81.883710] rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { 22-.... } 1547 jiffies s: 481 root: 0x2/.
...
[ 95.077093] rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { 22-.... } 15840 jiffies s: 481 root: 0x2/.
[ 95.362327] rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { 22-.... } 16149 jiffies s: 481 root: 0x2/.
[ 95.647549] rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: {


Of the three self-detected prints, the second and third system cputimes
of 8809 and 8655 are still greater than 1500.

[ 73.774167] tcrypt: rcu testing - kernel_fpu_disable for rude 22 s
[ 76.540071] rcu: INFO: rcu_preempt self-detected stall on CPU 22
[ 76.546118] rcu: 22-....: (2994 ticks this GP) idle=716c/1/0x4000000000000000 softirq=2095/2095 fqs=751
[ 76.555661] rcu: CPU 22 hardirqs softirqs csw system cond_resched
[ 76.564067] rcu: CPU 22 count: 0 4 0 0
[ 76.572560] rcu: CPU 22 cputime: 3 0 1493 ==> 1500 (ms)
[ 76.582277] rcu: CPU 22 current: in_kernel_fpu_begin=1 this_cpu_preemptible=0
[ 76.589546] rcu: (self-detected stall on CPU 22 t=3054 jiffies g=13897 q=274 ncpus=56)
...
[ 85.021211] rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: {
[ 85.158741] rcu: INFO: rcu_preempt self-detected stall on CPU 22
[ 85.159351] 22-....
[ 85.163204] rcu: 22-....: (11839 ticks this GP) idle=716c/1/0x4000000000000000 softirq=2095/2095 fqs=2959
[ 85.163207] rcu: CPU 22 hardirqs softirqs csw system cond_resched
[ 85.163208] rcu: CPU 22 count: 0 43 0 0
[ 85.163210] rcu: CPU 22 cputime: 491 0 8809 ==> 1500 (ms)
[ 85.166980] } 4831 jiffies s: 481 root: 0x2/.
[ 85.170745] rcu: CPU 22 current: in_kernel_fpu_begin=1 this_cpu_preemptible=0
[ 85.170747] rcu: (self-detected stall on CPU 22 t=12351 jiffies g=13897 q=2542 ncpus=56)
...
[ 93.899261] rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: {
[ 93.920482] rcu: INFO: rcu_preempt self-detected stall on CPU 22
[ 93.933812] 22-....
[ 93.939063] rcu: 22-....: (20543 ticks this GP) idle=716c/1/0x4000000000000000 softirq=2095/2095 fqs=5216
[ 93.939066] rcu: CPU 22 hardirqs softirqs csw system cond_resched
[ 93.939067] rcu: CPU 22 count: 0 59 0 0
[ 93.946245] } 14342 jiffies s: 481 root: 0x2/.
[ 93.953415] rcu: CPU 22 cputime: 777 0 8655 ==> 1500 (ms)
[ 93.953418] rcu: CPU 22 current: in_kernel_fpu_begin=1 this_cpu_preemptible=0
[ 93.953419] rcu: (self-detected stall on CPU 22 t=21865 jiffies g=13897 q=6284 ncpus=56)
[ 93.953421] CPU: 22 PID: 2604 Comm: modprobe Not tainted 6.0.0+ #13
[ 93.960598] rcu: blocking rcu_node structures (internal RCU debug):

(I added "CPU %d" to all the self-detected lines to try to help the interleaving)