Re: [PATCH v4 4/4] rcu: Add RCU stall diagnosis information

From: Leizhen (ThunderTown)
Date: Sun Nov 06 2022 - 22:21:40 EST




On 2022/11/6 4:32, Paul E. McKenney wrote:
> On Sat, Nov 05, 2022 at 03:03:14PM +0800, Leizhen (ThunderTown) wrote:
>> On 2022/11/5 9:58, Elliott, Robert (Servers) wrote:
>
> [ . . . ]
>
>>>> +int rcu_cpu_stall_cputime __read_mostly =
>>>> IS_ENABLED(CONFIG_RCU_CPU_STALL_CPUTIME);
>>>
>>> As a config option and module parameter, adding some more
>>> instrumentation overhead might be worthwhile for other
>>> likely causes of rcu stalls.
>>>
>>> For example, if enabled, have these functions (if available
>>> on the architecture) maintain a per-CPU running count of
>>> their invocations, which also cause the CPU to be unavailable
>>> for rcu:
>>> - kernel_fpu_begin() calls - FPU/SIMD context preservation,
>>> which also calls preempt_disable()
>>> - preempt_disable() calls - scheduler context switches disabled
>>> - local_irq_save() calls - interrupts disabled
>>> - cond_resched() calls - lack of these is a problem
>>>
>>> For kernel_fpu_begin and preempt_disable, knowing if it is
>>> currently blocked for those reasons is probably the most
>>> helpful.
>>
>> These instructions is already in Documentation/RCU/stallwarn.rst
>
> Excellent point -- this document also needs to be updated with this
> new information. I have pulled in your four patches as noted in my
> previous email. They are on the -rcu tree's "dev" branch.

OK, thanks.

>
> Could you please send a patch containing an initial update to
> stallwarn.rst? The main thing I need is your perspective on how each
> field is used.

Okay, I'll add some descriptions to illustrate how to use this function
to identify each RCU stall cases.

>
> Thanx, Paul
> .
>

--
Regards,
Zhen Lei