Re: [PATCH v6 1/2] rcu: Add RCU stall diagnosis information

From: Leizhen (ThunderTown)
Date: Thu Nov 10 2022 - 03:27:45 EST




On 2022/11/10 0:55, Elliott, Robert (Servers) wrote:
>
>
>> b/Documentation/admin-guide/kernel-parameters.txt
>> index a465d5242774af8..2729f3ad11d108b 100644
>> --- a/Documentation/admin-guide/kernel-parameters.txt
>> +++ b/Documentation/admin-guide/kernel-parameters.txt
>> @@ -5082,6 +5082,12 @@
>> rcupdate.rcu_cpu_stall_timeout to be used (after
>> conversion from seconds to milliseconds).
>>
>> + rcupdate.rcu_cpu_stall_cputime= [KNL]
>> + Provide statistics on the cputime and count of
>> + interrupts and tasks during the sampling period. For
>> + multiple continuous RCU stalls, all sampling periods
>> + begin at half of the first RCU stall timeout.
>
> This description should start with:
> "In kernels built with CONFIG_RCU_CPU_STALL_TIME=y, "
>
> Also, that parameter name seems like it contains a time value, but
> it's really just treated as zero vs. anything else. Consider renaming
> it to rcu_cpu_stall_cputime_en or describing the values in the
> description ("0 disables, all other values enable").
>
>> diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
>> +struct rcu_snap_record {
>> + unsigned long gp_seq; /* Track rdp->gp_seq counter */
>> + u64 cputime_irq; /* Accumulated cputime of hard irqs */
>> + u64 cputime_softirq;/* Accumulated cputime of soft irqs */
>> + u64 cputime_system; /* Accumulated cputime of kernel tasks
>> */
>> + unsigned long nr_hardirqs; /* Accumulated number of hard irqs */
>> + unsigned int nr_softirqs; /* Accumulated number of soft irqs */
>
> That should be "unsigned long" to match the other patch

We have discussed this before. And you mentioned:

irqs_sumstruct kernel_stat {
unsigned long irqs_sum;
unsigned int softirqs[NR_SOFTIRQS];
};

The softirqs field is an unsigned int, so the new function doesn't have
this inconsistency.

>
>
>> diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h
>> +static void print_cpu_stat_info(int cpu)
>> +{
> ...
>> + pr_err(" hardirqs softirqs csw/system\n");
>> + pr_err(" number: %8ld %10d %12lld\n",
>> + kstat_cpu_irqs_sum(cpu) - rsrp->nr_hardirqs,
>> + kstat_cpu_softirqs_sum(cpu) - rsrp->nr_softirqs,
>> + nr_context_switches_cpu(cpu) - rsrp->nr_csw);
>> + pr_err("cputime: %8lld %10lld %12lld ==> %lld(ms)\n",
>
> Those should all start with "\t" to match other related prints.

Right, thanks.

>
>
> .
>

--
Regards,
Zhen Lei