Re: [PATCH 2/4] timer: relax tick stop in idle entry

From: Paul E. McKenney
Date: Mon Nov 16 2015 - 18:26:35 EST


On Mon, Nov 16, 2015 at 02:32:11PM -0800, Josh Triplett wrote:
> On Mon, Nov 16, 2015 at 01:51:26PM -0800, Jacob Pan wrote:
> > On Mon, 16 Nov 2015 16:06:57 +0100 (CET)
> > Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote:
> >
> > > > <idle>-0 [000] 30.093474: bprint:
> > > > __tick_nohz_idle_enter: JPAN: tick_nohz_stop_sched_tick 609 delta
> > > > 1000000 [JP] but sees delta is exactly 1 tick away. didn't stop
> > > > tick.
> > >
> > > If the delta is 1 tick then it is not supposed to stop it. Did you
> > > ever try to figure out WHY it is 1 tick?
> > >
> > > There are two code pathes which can set it to basemono + TICK_NSEC:
> > >
> > > if (rcu_needs_cpu(basemono, &next_rcu) ||
> > > arch_needs_cpu() || irq_work_needs_cpu()) {
> > > next_tick = basemono + TICK_NSEC;
> > > } else {
> > > next_tmr = get_next_timer_interrupt(basejiff,
> > > basemono); ts->next_timer = next_tmr;
> > > /* Take the next rcu event into account */
> > > next_tick = next_rcu < next_tmr ? next_rcu : next_tmr;
> > > }
> > >
> > > Can you please figure out WHY the tick is requested to continue
> > > instead of blindly wreckaging the logic in that code?
> >
> > Looks like the it hits in both cases during forced idle.
> > + Josh
> > + Paul
> >
> > For the first case, it is always related to RCU. I found there are two
> > CONFIG options to avoid this undesired tick in idle loop.
> > 1. enable CONFIG_RCU_NOCB_CPU_ALL, offload to orcu kthreads
> > 2. or enable CONFIG_RCU_FAST_NO_HZ (enter dytick idle w/ rcu callback)
> >
> > Either one works but my concern is that users may not realize the
> > intricate CONFIG_ options and how they translate into energy savings.
> > Consulted with Josh, it seems we could add a check here to recognize
> > the forced idle state and relax rcu_needs_cpu() to return false even it
> > has callbacks. Since we are blocking everybody for a short time (5 ticks
> > default). It should not impact synchronize and kfree rcu.
>
> Right; as long as you're blocking *everybody*, and RCU priority boosting
> doesn't come into play (meaning a real-time task is waiting on RCU
> callbacks), then I don't see any harm in blocking RCU callbacks for a
> while. You'd block completion of synchronize_rcu() and similar, as well
> as memory reclamation, but since you've blocked *every* CPU systemwide
> then that doesn't cause a problem.

True enough. But how does RCU distinguish between this being a
normal idle cycle that might last indefinitely on the one hand and the
five-jiffy system-wide throttling on the other? OK, maybe there is a
global variable that says that the just-now-starting idle period is
system-wide throttling. But then what about the CPU that just went
idle 10 microseconds ago, and therefore left its timer tick running?
Fine and well, we could IPI it to wake it up and let it see that we
are now doing thermal throttling. But then we presumably also have to
IPI it at the end of the thermal-throttling interval in order for it to
re-evaluate whether or not it should have the tick going. :-/

On the one hand, I am sure that all of this can be made to work,
but simply having systems using thermal throttling enable either
CONFIG_RCU_NOCB_CPU_ALL or CONFIG_RCU_FAST_NO_HZ seems -way- simpler.
CONFIG_RCU_FAST_NO_HZ is probably the better choice for generic workloads,
but CONFIG_RCU_NOCB_CPU_ALL is the better choice for embedded workloads
where it is less likely that RCU callbacks will be posted with continuous
wild abandon.

Or am I missing something subtle here?

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/