Re: rcu_preempt self-detected stall on CPU from 4.5-rc3, since 3.17

From: Ross Green
Date: Thu Feb 18 2016 - 22:56:24 EST


On Fri, Feb 19, 2016 at 10:14 AM, Mathieu Desnoyers
<mathieu.desnoyers@xxxxxxxxxxxx> wrote:
> ----- On Feb 18, 2016, at 6:51 AM, Ross Green rgkernel@xxxxxxxxx wrote:
>
>> On Thu, Feb 18, 2016 at 10:19 AM, Paul E. McKenney
>> <paulmck@xxxxxxxxxxxxxxxxxx> wrote:
>>> On Wed, Feb 17, 2016 at 12:28:29PM -0800, Paul E. McKenney wrote:
>>>> On Wed, Feb 17, 2016 at 08:45:54PM +0100, Peter Zijlstra wrote:
>>>> > On Wed, Feb 17, 2016 at 11:28:17AM -0800, Paul E. McKenney wrote:
>>>> > > On Tue, Feb 16, 2016 at 09:45:49PM -0800, Paul E. McKenney wrote:
>>>> > > > On Tue, Feb 09, 2016 at 09:11:55PM +1100, Ross Green wrote:
>>>> > > > > Continued testing with the latest linux-4.5-rc3 release.
>>>> > > > >
>>>> > > > > Please find attached a copy of traces from dmesg:
>>>> > > > >
>>>> > > > > There is a lot more debug and trace data so hopefully this will shed
>>>> > > > > some light on what might be happening here.
>>>> > > > >
>>>> > > > > My testing remains run a series of simple benchmarks, let that run to
>>>> > > > > completion and then leave the system idle away with just a few daemons
>>>> > > > > running.
>>>> > > > >
>>>> > > > > the self detected stalls in this instance turned up after a days run time.
>>>> > > > > There were NO heavy artificial computational loads on the machine.
>>>> > > >
>>>> > > > It does indeed look quiet on that dmesg for a good long time.
>>>> > > >
>>>> > > > The following insanely crude not-for-mainline hack -might- be producing
>>>> > > > good results in my testing. It will take some time before I can claim
>>>> > > > statistically different results. But please feel free to give it a go
>>>> > > > in the meantime. (Thanks to Al Viro for pointing me in this direction.)
>>>> >
>>>> > Your case was special in that is was hotplug triggering it, right?
>>>>
>>>> Yes, it has thus far only shown up with CPU hotplug enabled.
>>>>
>>>> > I was auditing the hotplug paths involved when I fell ill two weeks ago,
>>>> > and have not really made any progress on that because of that :/
>>>>
>>>> I have always said that being sick is bad for one's health, but I didn't
>>>> realize that it could be bad for the kernel's health as well. ;-)
>>>>
>>>> > I'll go have another look, I had a vague feeling for a race back then,
>>>> > lets see if I can still remember how..
>>>>
>>>> I believe that I can -finally- get an ftrace_dump() to happen within
>>>> 10-20 milliseconds of the problem, which just might be soon enough
>>>> after the problem to gather some useful information. I am currently
>>>> testing this theory with "ftrace trace_event=sched_waking,sched_wakeup"
>>>> boot arguments on a two-hour run.
>>>
>>> And apparently another way to greatly reduce the probability of this
>>> bug occurring is to enable ftrace. :-/
>>>
>>> Will try longer runs.
>>>
>>> Thanx, Paul
>>>
>>>> If this works out, what would be a useful set of trace events for me
>>>> to capture?
>>>>
>>>> Thanx, Paul
>>>
>>
>> Well managed to catch this one on linux-4.5-rc4.
>>
>> Took over 3 days and 7 hours to hit.
>>
>> Same test as before, boot, run a series of simple benchmarks and then
>> let the machine just idle away.
>>
>> As before, the reported stall, AND everything keeps on running as if
>> nothing had happened.
>>
>> I notice in the task dump for both the cpus, the swapper is running on
>> both cpus.
>>
>> does that make any sense?
>> There is around 3% of memory actually used.
>>
>> Anyway, please find attached a copy of the dmesg output.
>>
>> Hope this helps a few people fill in the missing pieces here.
>
> What seems weird here is that all code paths in the loop
> perform a WRITE_ONCE(rsp->gp_activity, jiffies), which
> implies progress in each case:
>
> - rcu_gp_init() does it,
> - both branches in the QS forcing loop do it, either
> through rcu_gp_fqs(), or directly,
>
> This means the thread is really stalled, and the backtrace
> shows those threads are stalled on the
>
> ret = wait_event_interruptible_timeout(rsp->gp_wq,
> rcu_gp_fqs_check_wake(rsp, &gf), j);
>
> Since this is a *_timeout wait, for which the timeout
> is bounded by "j" jiffies which is bounded by "HZ" value,
> we should really not stay there too long, even if we are
> not awakened by whatever is supposed to awaken us.
>
> So unless I'm missing something, it would look like
> schedule_timeout() is missing its timeout there.
>
> Perhaps we only experience this missed timeout here
> because typically there is always a wakeup coming sooner
> or later on relatively busy systems. This one is idle
> for quite a while.
>
> Thoughts ?
>
> Thanks,
>
> Mathieu
>
>
>>
>> Regards,
>>
>> Ross Green
>
> --
> Mathieu Desnoyers
> EfficiOS Inc.
> http://www.efficios.com


Well a bonus extra!
Kept everything running and there was another stall.
So i have included the demsg output for perusal.

Just to clear things up there is no hotplug involved in this system.
It is a standard Pandaboard ES Ti4460 two processor system.
I use this for testing as a generic armv7 processor, plus can keep it
just running along for testing for a long time. the system has a total
of 23-25 process running on average. Mainly standard daemons. There is
certainly no heavy processing going on. I run a series of benchmarks
that are cpu intensive for the first 20 miinutes after boot and then
just leave it idle away. checking every so often to see how it has
gone.
As mentioned I have observed these stalls going back to 3.17 kernel.
It will often take up to a week to record such a stall. I will
typically test every new release kernel, so the -rc? series will get
around a weeks testing.

Regards,

Ross Green

Attachment: dmesg-4.5-rc4-2
Description: Binary data