Re: RCU vs NOHZ

From: Paul E. McKenney
Date: Thu Sep 29 2022 - 11:21:02 EST


On Thu, Sep 29, 2022 at 12:55:58PM +0200, Peter Zijlstra wrote:
> On Sat, Sep 17, 2022 at 07:25:08AM -0700, Paul E. McKenney wrote:
> > On Fri, Sep 16, 2022 at 11:20:14AM +0200, Peter Zijlstra wrote:
> > > On Fri, Sep 16, 2022 at 12:58:17AM -0700, Paul E. McKenney wrote:
> > >
> > > > To the best of my knowledge at this point in time, agreed. Who knows
> > > > what someone will come up with next week? But for people running certain
> > > > types of real-time and HPC workloads, context tracking really does handle
> > > > both idle and userspace transitions.
> > >
> > > Sure, but idle != nohz. Nohz is where we disable the tick, and currently
> > > RCU can inhibit this -- rcu_needs_cpu().
> >
> > Exactly. For non-nohz userspace execution, the tick is still running
> > anyway, so RCU of course won't be inhibiting its disabling. And in that
> > case, RCU's hook is the tick interrupt itself. RCU's hook is passed a
> > flag saying whether the interrupt came from userspace or from kernel.
>
> I'm not sure how we ended up here; this is completely irrelevant and I'm
> not disagreeing with it.
>
> > > AFAICT there really isn't an RCU hook for this, not through context
> > > tracking not through anything else.
> >
> > There is a directly invoked RCU hook for any transition that enables or
> > disables the tick, namely the ct_*_enter() and ct_*_exit() functions,
> > that is, those functions formerly known as rcu_*_enter() and rcu_*_exit().
>
> Context tracking doesn't know about NOHZ, therefore RCU can't either.
> Context tracking knows about IDLE, but not all IDLE is NOHZ-IDLE.
>
> Specifically we have:
>
> ct_{idle,irq,nmi,user,kernel}_enter()
>
> And none of them are related to NOHZ in the slightest. So no, RCU does
> not have a NOHZ callback.
>
> I'm still thikning you're conflating NOHZ_FULL (stopping the tick when
> in userspace) and regular NOHZ (stopping the tick when idle).
>
> > And this of course means that any additional schemes to reduce RCU's
> > power consumption must be compared (with real measurements on real
> > hardware!) to Joel et al.'s work, whether in combination or as an
> > alternative. And either way, the power savings must of course justify
> > the added code and complexity.
>
> Well, Joel's lazy scheme has the difficulty that you can wreck things by
> improperly marking the callback as lazy when there's an explicit
> dependency on it. The talk even called that out.
>
> I was hoping to construct a scheme that doesn't need the whole lazy
> approach.
>
>
> To recap; we want the CPU to go into deeper idle states, no?
>
> RCU can currently inhibit this by having callbacks pending for this CPU
> -- in this case RCU inhibits NOHZ-IDLE and deep power states are not
> selected or less effective.
>
> Now, deep idle states actually purge the caches, so cache locality
> cannot be an argument to keep the callbacks local.
>
> We know when we're doing deep idle we stop the tick.
>
> So why not, when stopping the tick, move the RCU pending crud elsewhere
> and let the CPU get on with going idle instead of inhibiting the
> stopping of the tick and wrecking deep idle?

Because doing so in the past has cost more energy than is saved.

Thanx, Paul