Re: Energy-efficiency options within RCU

From: Joel Fernandes
Date: Mon Dec 14 2020 - 13:13:59 EST


On Thu, Dec 10, 2020 at 10:37:37AM -0800, Paul E. McKenney wrote:
> Hello, Joel,
>
> In case you are -seriously- interested... ;-)

I am always seriously interested :-). The issue becomes when life throws me a
curveball. This was the year of curveballs :-)

Thank you for your reply and I have added it to my list to investigate how we
are configuring nocb on our systems. I don't think anyone over here has given
these RCU issues a serious look over here.

thanks,

- Joel



> Thanx, Paul
>
> rcu_nocbs=
>
> Adding a CPU to this list offloads RCU callback invocation from
> that CPU's softirq handler to a kthread. In big.LITTLE systems,
> this kthread can be placed on a LITTLE CPU, which has been
> demonstrated to save significant energy in benchmarks.
> http://www.rdrop.com/users/paulmck/realtime/paper/AMPenergy.2013.04.19a.pdf
>
> nohz_full=
>
> Any CPU specified by this boot parameter is handled as if it was
> specified by rcu_nocbs=.
>
> rcutree.jiffies_till_first_fqs=
>
> Increasing this will decrease wakeup frequency to the grace-period
> kthread for the first FQS scan. And increase grace-period
> latency.
>
> rcutree.jiffies_till_next_fqs=
>
> Ditto, but for the second and subsequent FQS scans.
>
> My guess is that neither of these makes much difference. But if
> they do, maybe some sort of backoff scheme for FQS scans?
>
> rcutree.jiffies_till_sched_qs=
>
> Increasing this will delay RCU's getting excited about CPUs and
> tasks not responding with quiescent states. This excitement
> can cause extra overhead.
>
> No idea whether adjusting this would help. But if you increase
> rcutree.jiffies_till_first_fqs or rcutree.jiffies_till_next_fqs,
> you might need to increase this one accordingly.
>
> rcutree.qovld=
>
> Increasing this will increase the grace-period duration at which
> RCU starts sending IPIs, thus perhaps reducing the total number
> of IPIs that RCU sends. The destination CPUs are unlikely to be
> idle, so it is not clear to me that this would help much. But
> perhaps I am wrong about them being mostly non-idle, who knows?
>
> rcupdate.rcu_cpu_stall_timeout=
>
> If you get overly zealous about the earlier kernel boot parameters,
> you might need to increase this one as well. Or instead use the
> rcupdate.rcu_cpu_stall_suppress= kernel boot parameter to suppress
> RCU CPU stall warnings entirely.
>
> rcutree.rcu_nocb_gp_stride=
>
> Increasing this might reduce grace-period work somewhat. I don't
> see why a (say) 16-CPU system really needs to have more than one
> rcuog kthread, so if this does help it might be worthwhile setting
> a lower limit to this kernel parameter.
>
> rcutree.rcu_idle_gp_delay= (Only CONFIG_RCU_FAST_NO_HZ=y kernels.)
>
> This defaults to four jiffies on the theory that grace periods
> tend to last about that long. If grace periods tend to take
> longer, then it makes a lot of sense to increase this. And maybe
> battery-powered devices would rather have it be about 2x or 3x
> the expected grace-period duration, who knows?
>
> I would keep it to a power of two, but the code should work with
> other numbers. Except that I don't know that this has ever been
> tested. ;-)
>
> srcutree.exp_holdoff=
>
> Increasing this decreases the number of SRCU grace periods that
> are treated as expedited. But you have to have closely-spaced
> SRCU grace periods for this to matter. (These do happen at least
> sometimes because I added this only because someone complained
> about the performance regression from the earlier non-tree SRCU.)
>
> rcupdate.rcu_task_ipi_delay=
>
> This kernel parameter delays sending IPIs for RCU Tasks Trace,
> which is used by sleepable BPF programs. Increasing it can
> reduce overhead, but can also increase the latency of removing
> sleepable BPF programs.
>
> rcupdate.rcu_task_stall_timeout=
>
> If you slow down RCU Tasks Trace too much, you may need this.
> But then again, the default 10-minute value should suffice.
>
> CONFIG_RCU_FAST_NO_HZ=y
>
> This only has effect on CPUs not specified by rcu_nocbs, and thus
> might be useful on systems that offload RCU callbacks only on
> some of the CPUs. For example, a big.LITTLE system might offload
> only the big CPUs. This Kconfig option reduces the frequency of
> timer interrupts (and thus of RCU-related softirq processing)
> on idle CPUs. This has been shown to save significant energy
> in benchmarks:
> http://www.rdrop.com/users/paulmck/realtime/paper/AMPenergy.2013.04.19a.pdf
>
> CONFIG_RCU_STRICT_GRACE_PERIOD=y
>
> This works hard (as in burns CPU) to sharply reduce grace-period
> latency. The effect is probably to greatly increase power
> consumption, but there might well be workloads where the shorter
> grace periods more than make up for the extra CPU time. Or not.
>
> CONFIG_HZ=
>
> Reducing the scheduler-clock interrupt frequency has the opposite
> effect, namely of increasing RCU grace-period latency, but while
> also reducing RCU's CPU utilization.
>
> CONFIG_TASKS_TRACE_RCU_READ_MB=y
>
> Reduce the need to IPI RCU Tasks Trace holdout tasks, but at the
> expense of an increase in to/from idle overhead. This Kconfig
> option also slows down the rate at which RCU Tasks Trace polls
> for holdout tasks. This polling rate cannot be separately
> specified, but if changing the initial source-code values of
> either rcu_tasks_trace.gp_sleep or rcu_tasks_trace.init_fract
> proves useful, kernel boot parameters could be created.
>
> That said, automatic initialization heuristics are more
> convenient. When they work, anyway.