Re: NOHZ tick-stop error: Non-RCU local softirq work is pending

From: Paul E. McKenney
Date: Thu Dec 10 2020 - 17:55:24 EST


And please see attached. Lots of output, in fact, enough that it
was still dumping when the second instance happened.

Thanx, Paul

On Thu, Dec 10, 2020 at 03:56:37PM +0100, Frederic Weisbecker wrote:
> Hi,
>
> On Wed, Nov 18, 2020 at 09:52:18AM -0800, Paul E. McKenney wrote:
> > Hello, Frederic,
> >
> > Here is the last few months' pile of warnings from rcutorture runs.
> >
> > Thanx, Paul
> >
> > [ 255.098527] NOHZ tick-stop error: Non-RCU local softirq work is pending, handler #282!!!
> > [ 414.534548] NOHZ tick-stop error: Non-RCU local softirq work is pending, handler #80!!!
> > [ 3798.654736] NOHZ tick-stop error: Non-RCU local softirq work is pending, handler #80!!!
> > [ 1718.589367] NOHZ tick-stop error: Non-RCU local softirq work is pending, handler #80!!!
> > [ 6632.777655] NOHZ tick-stop error: Non-RCU local softirq work is pending, handler #80!!!
> > [ 2873.688490] NOHZ tick-stop error: Non-RCU local softirq work is pending, handler #80!!!
> > [ 3081.738937] NOHZ tick-stop error: Non-RCU local softirq work is pending, handler #80!!!
> > [ 2673.597523] NOHZ tick-stop error: Non-RCU local softirq work is pending, handler #80!!!
> > [ 1467.372887] NOHZ tick-stop error: Non-RCU local softirq work is pending, handler #80!!!
> > [ 34.371094] NOHZ tick-stop error: Non-RCU local softirq work is pending, handler #80!!!
> > [ 1147.260097] NOHZ tick-stop error: Non-RCU local softirq work is pending, handler #80!!!
> > [ 5066.699589] NOHZ tick-stop error: Non-RCU local softirq work is pending, handler #80!!!
> > [ 816.338843] NOHZ tick-stop error: Non-RCU local softirq work is pending, handler #80!!!
> > [ 34.338836] NOHZ tick-stop error: Non-RCU local softirq work is pending, handler #80!!!
> > [ 1234.111394] NOHZ tick-stop error: Non-RCU local softirq work is pending, handler #280!!!
> > [ 1282.109415] NOHZ tick-stop error: Non-RCU local softirq work is pending, handler #280!!!
> > [ 239.215890] NOHZ tick-stop error: Non-RCU local softirq work is pending, handler #280!!!
> > [ 367.918969] NOHZ tick-stop error: Non-RCU local softirq work is pending, handler #202!!!
> > [ 1461.037894] NOHZ tick-stop error: Non-RCU local softirq work is pending, handler #80!!!
> > [ 1503.810903] NOHZ tick-stop error: Non-RCU local softirq work is pending, handler #280!!!
> > [ 1503.811939] NOHZ tick-stop error: Non-RCU local softirq work is pending, handler #280!!!
> > [ 699.514824] NOHZ tick-stop error: Non-RCU local softirq work is pending, handler #202!!!
> > [ 751.681629] NOHZ tick-stop error: Non-RCU local softirq work is pending, handler #280!!!
> > [ 287.770126] NOHZ tick-stop error: Non-RCU local softirq work is pending, handler #202!!!
> > [ 287.771096] NOHZ tick-stop error: Non-RCU local softirq work is pending, handler #202!!!
> > [ 648.009370] NOHZ tick-stop error: Non-RCU local softirq work is pending, handler #80!!!
> > [ 924.733405] NOHZ tick-stop error: Non-RCU local softirq work is pending, handler #202!!!
> > [ 924.734011] NOHZ tick-stop error: Non-RCU local softirq work is pending, handler #202!!!
> > [ 1743.197353] NOHZ tick-stop error: Non-RCU local softirq work is pending, handler #02!!!
> > [ 1528.161635] NOHZ tick-stop error: Non-RCU local softirq work is pending, handler #280!!!
> > [ 1528.162313] NOHZ tick-stop error: Non-RCU local softirq work is pending, handler #280!!!
> > [ 265.201513] NOHZ tick-stop error: Non-RCU local softirq work is pending, handler #280!!!
> > [ 473.137587] NOHZ tick-stop error: Non-RCU local softirq work is pending, handler #202!!!
> > [ 187.375426] NOHZ tick-stop error: Non-RCU local softirq work is pending, handler #280!!!
> > [ 1361.544451] NOHZ tick-stop error: Non-RCU local softirq work is pending, handler #80!!!
> > [ 79.519727] NOHZ tick-stop error: Non-RCU local softirq work is pending, handler #280!!!
>
> Would you be willing to run TREE05 for me until it triggers the issue with:
>
> trace_event=softirq_raise trace_options=stacktrace
>
> And with the below patch, thanks! (make sure you have CONFIG_EVENT_TRACING=y)
>
> diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
> index 81632cd5e3b7..1751e2d9a5b5 100644
> --- a/kernel/time/tick-sched.c
> +++ b/kernel/time/tick-sched.c
> @@ -929,6 +929,8 @@ static bool can_stop_idle_tick(int cpu, struct tick_sched *ts)
> (local_softirq_pending() & SOFTIRQ_STOP_IDLE_MASK)) {
> pr_warn("NOHZ tick-stop error: Non-RCU local softirq work is pending, handler #%02x!!!\n",
> (unsigned int) local_softirq_pending());
> + dump_stack();
> + ftrace_dump(DUMP_ORIG);
> ratelimit++;
> }
> return false;

Attachment: frederic.trace.gz
Description: frederic.trace.gz