Re: rcu_preempt detected stalls.

From: Paul E. McKenney
Date: Thu Oct 23 2014 - 16:48:19 EST


On Thu, Oct 23, 2014 at 04:28:16PM -0400, Dave Jones wrote:
> On Thu, Oct 23, 2014 at 12:52:21PM -0700, Paul E. McKenney wrote:
> > On Thu, Oct 23, 2014 at 03:37:59PM -0400, Dave Jones wrote:
> > > On Thu, Oct 23, 2014 at 12:28:07PM -0700, Paul E. McKenney wrote:
> > >
> > > > > > This one will require more looking. But did you do something like
> > > > > > create a pair of mutually recursive symlinks or something? ;-)
> > > > >
> > > > > I'm not 100% sure, but this may have been on a box that I was running
> > > > > tests on NFS. So maybe the server had disappeared with the mount
> > > > > still active..
> > > > >
> > > > > Just a guess tbh.
> > > >
> > > > Another possibility might be that the box was so overloaded that tasks
> > > > were getting preempted for 21 seconds as a matter of course, and sometimes
> > > > within RCU read-side critical sections. Or did the box have ample idle
> > > > time?
> > >
> > > I fairly recently upped the number of child processes I typically run
> > > with, so it being overloaded does sound highly likely.
> >
> > Ah, that could do it! One way to test extreme loads and not trigger
> > RCU CPU stall warnings might be to make all of your child processes all
> > sleep during a given interval of a few hundred milliseconds during each
> > ten-second interval. Would that work for you?
>
> This feels like hiding from the problem rather than fixing it.
> I'm not sure it even makes sense to add sleeps to the fuzzer, other than
> to slow things down, and if I were to do that, I may as well just run
> it with fewer threads instead.

I was thinking of the RCU CPU stall warnings that were strictly due to
overload as being false positives. If trinity caused a kthread to loop
within an RCU read-side critical section, you would still get the RCU
CPU stall warning even with the sleeps.

But just a suggestion, no strong feelings. Might change if there is an
excess of false-positive RCU CPU stall warnings, of course. ;-)

> While the fuzzer is doing pretty crazy stuff, what's different about it
> from any other application that overcommits the CPU with too many threads?

The (presumably) much higher probability of being preempted in the kernel,
and thus within an RCU read-side critical section.

> We impose rlimits to stop people from forkbombing and the like, but this
> doesn't even need that many processes to trigger, and with some effort
> could probably done with even fewer if I found ways to keep other cores
> busy in the kernel for long enough.
>
> That all said, I don't have easy reproducers for this right now, due
> to other bugs manifesting long before this gets to be a problem.

Fair enough! ;-)

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/