Re: [PATCH] rcu: Only pin GP kthread when full dynticks is actually used

From: Paul E. McKenney
Date: Fri Jun 13 2014 - 12:16:42 EST


On Fri, Jun 13, 2014 at 06:00:04PM +0200, Frederic Weisbecker wrote:
> On Fri, Jun 13, 2014 at 08:52:33AM -0700, Paul E. McKenney wrote:
> > On Fri, Jun 13, 2014 at 02:47:16PM +0200, Frederic Weisbecker wrote:
> > > On Thu, Jun 12, 2014 at 06:35:15PM -0700, Paul E. McKenney wrote:
> > > > On Thu, Jun 12, 2014 at 06:24:32PM -0700, Paul E. McKenney wrote:
> > > > > On Fri, Jun 13, 2014 at 02:16:59AM +0200, Frederic Weisbecker wrote:
> > > > > > CONFIG_NO_HZ_FULL may be enabled widely on distros nowadays but actual
> > > > > > users should be a tiny minority, if actually any.
> > > > > >
> > > > > > Also there is a risk that affining the GP kthread to a single CPU could
> > > > > > end up noticeably reducing RCU performances and increasing energy
> > > > > > consumption.
> > > > > >
> > > > > > So lets affine the GP kthread only when nohz full is actually used
> > > > > > (ie: when the nohz_full= parameter is filled or CONFIG_NO_HZ_FULL_ALL=y)
> > > >
> > > > Which reminds me... Kernel-heavy workloads running NO_HZ_FULL_ALL=y
> > > > can see long RCU grace periods, as in about two seconds each. It is
> > > > not hard for me to detect this situation.
> > >
> > > Ah yeah sounds quite long.
> > >
> > > > Is there some way I can
> > > > call for a given CPU's scheduling-clock interrupt to be turned on?
> > >
> > > Yeah, once the nohz kick patchset (https://lwn.net/Articles/601214/) is merged,
> > > a simple call to tick_nohz_full_kick_cpu() should do the trick. Although the
> > > right condition must be there on the IPI side. Maybe with rcu_needs_cpu() or such.
> >
> > I could record the offending GP, and make rcu_needs_cpu() return true
> > if the current GP matches the offending one.
> >
> > > But it would be interesting to identify the sources of these extended grace periods.
> > > If we only restart the tick, we may ignore some deeper oustanding issue.
> >
> > Some of them have been fixable by other means, but they will probably
> > come back as system sizes grow. And I really have put preemption points
> > into kernel code in response to RCU CPU stall warnings, and the current
> > state of NO_HZ_FULL effectively ignores these preemption points. :-/
>
> I'm not sure I really understand the issue though. So you have RCU CPU stalls due
> to very extended grace periods, right?
>
> I'm not sure how preemption points would solve that. Or maybe you're
> trying to trigger quiescent states reports through these preemption points?

If we have scheduling-clock interrupts, the preemption points will help
push RCU through its state machine. If we don't have scheduling-clock
interrupts, RCU can't make progress in this case.

> Is it because we have dynticks CPUs staying too long in the kernel without
> taking any quiescent states? Are we perhaps missing some rcu_user_enter() or
> things?

Sort of the former, but combined with the fact that in-kernel CPUs still
need scheduling-clock interrupts for RCU to make progress. I could
move this to RCU's context-switch hook, but that could be very bad for
workloads that do lots of context switching.

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/