Re: [PATCH] sched/debug: avoid executing show_state and causing rcu stall warning

From: Liu Song
Date: Wed Aug 03 2022 - 05:26:04 EST


* Liu Song <liusong@xxxxxxxxxxxxxxxxx> wrote:

* Liu Song <liusong@xxxxxxxxxxxxxxxxx> wrote:

From: Liu Song <liusong@xxxxxxxxxxxxxxxxx>

If the number of CPUs is large, "sysrq_sched_debug_show" will execute for
a long time. Every time I execute "echo t > /proc/sysrq-trigger" on my
128-core machine, the rcu stall warning will be triggered. Moreover,
sysrq_sched_debug_show does not need to be protected by rcu_read_lock,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
and no rcu stall warning will appear after adjustment.

That doesn't mean it doesn't have to be protected by *any* lock - which
your patch implements AFAICS.

There's a couple of lines such as:

for_each_online_cpu(cpu) {
Hi,

Here I refer to the implementation of "sysrq_timer_list_show", and I don't
see any lock.

Maybe there is a problem with the implementation of "sysrq_timer_list_show".
But we are talking about sysrq_sched_debug_show(), which your patch tries
to relax the RCU locking of.

Hi,

I'm not sure for_each_online_cpu && print_cpu must need a lock to protect, so I refer to other codes

under kernel that reference the implementation. It looks like some places use "get_online_cpus" to prevent

cpu hotplug, but many places don't have obvious protection, so I'm also confused if protection is necessarily

required.


Thanks


Thanks,

Ingo