Re: RCU stall when using function_graph

From: Paul E. McKenney
Date: Sun Aug 06 2017 - 13:02:44 EST


On Sat, Aug 05, 2017 at 02:24:21PM +0900, êëí wrote:
> Dear All
>
> As for me, after configuring function_graph as below, crash disappears.
> "echo 0 > d/tracing/tracing_on"
> "sleep 1"
>
> "echo function_graph > d/tracing/current_tracer"
> "sleep 1"
>
> "echo smp_call_function_single > d/tracing/set_ftrace_filter"
> adb shell "sleep 1"
>
> "echo 1 > d/tracing/tracing_on"
> adb shell "sleep 1"
>
> Right after function_graph is enabled, too many logs are traced upon IRQ
> transaction which many times eventually causes stall.

That would do it!

Hmmm...

Steven, would it be helpful if RCU were to inform tracing (say) halfway
through the RCU CPU stall interval, allowing the tracer to do something
like cond_resched_rcu_qs()? I can imagine all sorts of reasons why this
wouldn't work, for example, if all the tracing was with irqs disabled
or some such, but figured I should ask.

Does Guillermo's approach work for others?

Thanx, Paul

> BR,
> Guillermo Austin Kim
>
> 2017. 8. 3. ìí 11:38ì "Daniel Lezcano" <daniel.lezcano@xxxxxxxxxx>ëì ìì:
>
> On Thu, Aug 03, 2017 at 05:44:21AM -0700, Paul E. McKenney wrote:
>
> [ ... ]
>
> > > > BTW, function_graph tracer is the most invasive of the tracers. It's
> 4x
> > > > slower than function tracer. I'm wondering if the tracer isn't the
> > > > cause, but just slows things down enough to cause a some other race
> > > > condition that triggers the bug.
> > >
> > > Yes, that could be true.
> > >
> > > I tried the following scenario:
> > >
> > > - cpufreq governor => userspace + max_freq (1.2GHz)
> > > - function_graph set ==> OK
> > >
> > > - cpufreq governor => userspace + min_freq (200MHz)
> > > - function_graph set ==> RCU stall
> > >
> > > Beside that, I realize the board is constantly processing SOF interrupts
> > > every 124us, so that adds more overhead.
> > >
> > > Removing the USB support, thus the associated processing for the SOF
> > > interrupts, I don't see anymore the RCU stall.
> >
> > Looks like Steve called this one! ;-)
>
> Yep :)
>
> > > Is it the expected behavior to have the system hang after a RCU stall
> > > raises ?
> >
> > No, but if NMI stack traces are enabled and there are any NMI problems,
> > bad things can happen. In addition, the bulk of output can cause problems
> > if you have a slow console connection.
>
> Ok, thanks.
>
> -- Daniel
>
> --
>
> <http://www.linaro.org/> Linaro.org â Open source software for ARM SoCs
>
> Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook |
> <http://twitter.com/#!/linaroorg> Twitter |
> <http://www.linaro.org/linaro-blog/> Blog