Re: RCU vs NOHZ

From: Peter Zijlstra
Date: Thu Sep 29 2022 - 06:56:14 EST


On Sat, Sep 17, 2022 at 07:25:08AM -0700, Paul E. McKenney wrote:
> On Fri, Sep 16, 2022 at 11:20:14AM +0200, Peter Zijlstra wrote:
> > On Fri, Sep 16, 2022 at 12:58:17AM -0700, Paul E. McKenney wrote:
> >
> > > To the best of my knowledge at this point in time, agreed. Who knows
> > > what someone will come up with next week? But for people running certain
> > > types of real-time and HPC workloads, context tracking really does handle
> > > both idle and userspace transitions.
> >
> > Sure, but idle != nohz. Nohz is where we disable the tick, and currently
> > RCU can inhibit this -- rcu_needs_cpu().
>
> Exactly. For non-nohz userspace execution, the tick is still running
> anyway, so RCU of course won't be inhibiting its disabling. And in that
> case, RCU's hook is the tick interrupt itself. RCU's hook is passed a
> flag saying whether the interrupt came from userspace or from kernel.

I'm not sure how we ended up here; this is completely irrelevant and I'm
not disagreeing with it.

> > AFAICT there really isn't an RCU hook for this, not through context
> > tracking not through anything else.
>
> There is a directly invoked RCU hook for any transition that enables or
> disables the tick, namely the ct_*_enter() and ct_*_exit() functions,
> that is, those functions formerly known as rcu_*_enter() and rcu_*_exit().

Context tracking doesn't know about NOHZ, therefore RCU can't either.
Context tracking knows about IDLE, but not all IDLE is NOHZ-IDLE.

Specifically we have:

ct_{idle,irq,nmi,user,kernel}_enter()

And none of them are related to NOHZ in the slightest. So no, RCU does
not have a NOHZ callback.

I'm still thikning you're conflating NOHZ_FULL (stopping the tick when
in userspace) and regular NOHZ (stopping the tick when idle).

> And this of course means that any additional schemes to reduce RCU's
> power consumption must be compared (with real measurements on real
> hardware!) to Joel et al.'s work, whether in combination or as an
> alternative. And either way, the power savings must of course justify
> the added code and complexity.

Well, Joel's lazy scheme has the difficulty that you can wreck things by
improperly marking the callback as lazy when there's an explicit
dependency on it. The talk even called that out.

I was hoping to construct a scheme that doesn't need the whole lazy
approach.


To recap; we want the CPU to go into deeper idle states, no?

RCU can currently inhibit this by having callbacks pending for this CPU
-- in this case RCU inhibits NOHZ-IDLE and deep power states are not
selected or less effective.

Now, deep idle states actually purge the caches, so cache locality
cannot be an argument to keep the callbacks local.

We know when we're doing deep idle we stop the tick.

So why not, when stopping the tick, move the RCU pending crud elsewhere
and let the CPU get on with going idle instead of inhibiting the
stopping of the tick and wrecking deep idle?