Re: [PATCH RFC v5] cpufreq: schedutil: Make iowait boost more energy efficient

From: Viresh Kumar
Date: Mon Jul 17 2017 - 04:04:55 EST


On 16-07-17, 01:04, Joel Fernandes wrote:
> Currently the iowait_boost feature in schedutil makes the frequency go to max
> on iowait wakeups. This feature was added to handle a case that Peter
> described where the throughput of operations involving continuous I/O requests
> [1] is reduced due to running at a lower frequency, however the lower
> throughput itself causes utilization to be low and hence causing frequency to
> be low hence its "stuck".
>
> Instead of going to max, its also possible to achieve the same effect by
> ramping up to max if there are repeated in_iowait wakeups happening. This patch
> is an attempt to do that. We start from a lower frequency (policy->mind)

s/mind/min/

> and double the boost for every consecutive iowait update until we reach the
> maximum iowait boost frequency (iowait_boost_max).
>
> I ran a synthetic test (continuous O_DIRECT writes in a loop) on an x86 machine
> with intel_pstate in passive mode using schedutil. In this test the iowait_boost
> value ramped from 800MHz to 4GHz in 60ms. The patch achieves the desired improved
> throughput as the existing behavior.
>
> Also while at it, make iowait_boost and iowait_boost_max as unsigned int since
> its unit is kHz and this is consistent with struct cpufreq_policy.
>
> [1] https://patchwork.kernel.org/patch/9735885/
>
> Cc: Srinivas Pandruvada <srinivas.pandruvada@xxxxxxxxxxxxxxx>
> Cc: Len Brown <lenb@xxxxxxxxxx>
> Cc: Rafael J. Wysocki <rjw@xxxxxxxxxxxxx>
> Cc: Viresh Kumar <viresh.kumar@xxxxxxxxxx>
> Cc: Ingo Molnar <mingo@xxxxxxxxxx>
> Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
> Suggested-by: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
> Signed-off-by: Joel Fernandes <joelaf@xxxxxxxxxx>
> ---
> This version is based on some ideas from Viresh and Juri in v4. Viresh, one
> difference between the idea we just discussed is, I am scaling up/down the
> boost only after consuming it. This has the effect of slightly delaying the
> "deboost" but achieves the same boost ramp time. Its more cleaner in the code
> IMO to avoid the scaling up and then down on the initial boost. Note that I
> also dropped iowait_boost_min and now I'm just starting the initial boost from
> policy->min since as I mentioned in the commit above, the ramp of the
> iowait_boost value is very quick and for the usecase its intended for, it works
> fine. Hope this is acceptable. Thanks.
>
> kernel/sched/cpufreq_schedutil.c | 31 +++++++++++++++++++++++--------
> 1 file changed, 23 insertions(+), 8 deletions(-)
>
> diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> index 622eed1b7658..4225bbada88d 100644
> --- a/kernel/sched/cpufreq_schedutil.c
> +++ b/kernel/sched/cpufreq_schedutil.c
> @@ -53,8 +53,9 @@ struct sugov_cpu {
> struct update_util_data update_util;
> struct sugov_policy *sg_policy;
>
> - unsigned long iowait_boost;
> - unsigned long iowait_boost_max;
> + bool iowait_boost_pending;
> + unsigned int iowait_boost;
> + unsigned int iowait_boost_max;
> u64 last_update;
>
> /* The fields below are only needed when sharing a policy. */
> @@ -172,30 +173,43 @@ static void sugov_set_iowait_boost(struct sugov_cpu *sg_cpu, u64 time,
> unsigned int flags)
> {
> if (flags & SCHED_CPUFREQ_IOWAIT) {
> - sg_cpu->iowait_boost = sg_cpu->iowait_boost_max;
> + sg_cpu->iowait_boost_pending = true;
> + sg_cpu->iowait_boost = max(sg_cpu->iowait_boost,
> + sg_cpu->sg_policy->policy->min);
> } else if (sg_cpu->iowait_boost) {
> s64 delta_ns = time - sg_cpu->last_update;
>
> /* Clear iowait_boost if the CPU apprears to have been idle. */
> - if (delta_ns > TICK_NSEC)
> + if (delta_ns > TICK_NSEC) {
> sg_cpu->iowait_boost = 0;
> + sg_cpu->iowait_boost_pending = false;
> + }

We don't really need to clear this flag here as we are already making
iowait_boost as 0 and that's what we check while using boost.

> }
> }
>
> static void sugov_iowait_boost(struct sugov_cpu *sg_cpu, unsigned long *util,
> unsigned long *max)
> {
> - unsigned long boost_util = sg_cpu->iowait_boost;
> - unsigned long boost_max = sg_cpu->iowait_boost_max;
> + unsigned long boost_util, boost_max;
>
> - if (!boost_util)
> + if (!sg_cpu->iowait_boost)
> return;
>
> + boost_util = sg_cpu->iowait_boost;
> + boost_max = sg_cpu->iowait_boost_max;
> +

The above changes are not required anymore (and were required only
with my patch).

> if (*util * boost_max < *max * boost_util) {
> *util = boost_util;
> *max = boost_max;
> }
> - sg_cpu->iowait_boost >>= 1;
> +
> + if (sg_cpu->iowait_boost_pending) {
> + sg_cpu->iowait_boost_pending = false;
> + sg_cpu->iowait_boost = min(sg_cpu->iowait_boost << 1,
> + sg_cpu->iowait_boost_max);

Now this has a problem. We will also boost after waiting for
rate_limit_us. And that's why I had proposed the tricky solution in
the first place. I thought we wanted to avoid instant boost only for
the first iteration, but after that we wanted to do it ASAP. Isn't it?

Now that you are using policy->min instead of policy->cur, we can
simplify the solution I proposed and always do 2 * iowait_boost before
getting current util/max in above if loop. i.e. we will start iowait
boost with min * 2 instead of min and that should be fine.

> + } else {
> + sg_cpu->iowait_boost >>= 1;
> + }
> }
>
> #ifdef CONFIG_NO_HZ_COMMON
> @@ -267,6 +281,7 @@ static unsigned int sugov_next_freq_shared(struct sugov_cpu *sg_cpu, u64 time)
> delta_ns = time - j_sg_cpu->last_update;
> if (delta_ns > TICK_NSEC) {
> j_sg_cpu->iowait_boost = 0;
> + j_sg_cpu->iowait_boost_pending = false;

Not required here as well.

> continue;
> }
> if (j_sg_cpu->flags & SCHED_CPUFREQ_RT_DL)
> --
> 2.13.2.932.g7449e964c-goog

--
viresh