Re: [PATCH v7 00/11] track CPU utilization

From: Peter Zijlstra
Date: Thu Jul 05 2018 - 08:36:44 EST


On Thu, Jun 28, 2018 at 05:45:03PM +0200, Vincent Guittot wrote:
> Vincent Guittot (11):
> sched/pelt: Move pelt related code in a dedicated file
> sched/rt: add rt_rq utilization tracking
> cpufreq/schedutil: use rt utilization tracking
> sched/dl: add dl_rq utilization tracking
> cpufreq/schedutil: use dl utilization tracking
> sched/irq: add irq utilization tracking
> cpufreq/schedutil: take into account interrupt
> sched: schedutil: remove sugov_aggregate_util()
> sched: use pelt for scale_rt_capacity()
> sched: remove rt_avg code
> proc/sched: remove unused sched_time_avg_ms
>
> include/linux/sched/sysctl.h | 1 -
> kernel/sched/Makefile | 2 +-
> kernel/sched/core.c | 38 +---
> kernel/sched/cpufreq_schedutil.c | 65 ++++---
> kernel/sched/deadline.c | 8 +-
> kernel/sched/fair.c | 403 +++++----------------------------------
> kernel/sched/pelt.c | 399 ++++++++++++++++++++++++++++++++++++++
> kernel/sched/pelt.h | 72 +++++++
> kernel/sched/rt.c | 15 +-
> kernel/sched/sched.h | 68 +++++--
> kernel/sysctl.c | 8 -
> 11 files changed, 632 insertions(+), 447 deletions(-)
> create mode 100644 kernel/sched/pelt.c
> create mode 100644 kernel/sched/pelt.h

OK, this looks good I suppose. Rafael, are you OK with me taking these?

I have the below on top because I once again forgot how it all worked;
does this work for you Vincent?

---
Subject: sched/cpufreq: Clarify sugov_get_util()

Add a few comments (hopefully) clarifying some of the magic in
sugov_get_util().

Signed-off-by: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx>
---
cpufreq_schedutil.c | 69 ++++++++++++++++++++++++++++++++++++++--------------
1 file changed, 51 insertions(+), 18 deletions(-)

--- a/kernel/sched/cpufreq_schedutil.c
+++ b/kernel/sched/cpufreq_schedutil.c
@@ -177,6 +177,26 @@ static unsigned int get_next_freq(struct
return cpufreq_driver_resolve_freq(policy, freq);
}

+/*
+ * This function computes an effective utilization for the given CPU, to be
+ * used for frequency selection given the linear relation: f = u * f_max.
+ *
+ * The scheduler tracks the following metrics:
+ *
+ * cpu_util_{cfs,rt,dl,irq}()
+ * cpu_bw_dl()
+ *
+ * Where the cfs,rt and dl util numbers are tracked with the same metric and
+ * synchronized windows and are thus directly comparable.
+ *
+ * The cfs,rt,dl utilization are the running times measured with rq->clock_task
+ * which excludes things like IRQ and steal-time. These latter are then accrued in
+ * the irq utilization.
+ *
+ * The DL bandwidth number otoh is not a measured meric but a value computed
+ * based on the task model parameters and gives the minimal u required to meet
+ * deadlines.
+ */
static unsigned long sugov_get_util(struct sugov_cpu *sg_cpu)
{
struct rq *rq = cpu_rq(sg_cpu->cpu);
@@ -188,26 +208,50 @@ static unsigned long sugov_get_util(stru
if (rt_rq_is_runnable(&rq->rt))
return max;

+ /*
+ * Early check to see if IRQ/steal time saturates the CPU, can be
+ * because of inaccuracies in how we track these -- see
+ * update_irq_load_avg().
+ */
irq = cpu_util_irq(rq);
-
if (unlikely(irq >= max))
return max;

- /* Sum rq utilization */
+ /*
+ * Because the time spend on RT/DL tasks is visible as 'lost' time to
+ * CFS tasks and we use the same metric to track the effective
+ * utilization (PELT windows are synchronized) we can directly add them
+ * to obtain the CPU's actual utilization.
+ */
util = cpu_util_cfs(rq);
util += cpu_util_rt(rq);

/*
- * Interrupt time is not seen by rqs utilization nso we can compare
- * them with the CPU capacity
+ * We do not make cpu_util_dl() a permanent part of this sum because we
+ * want to use cpu_bw_dl() later on, but we need to check if the
+ * CFS+RT+DL sum is saturated (ie. no idle time) such that we select
+ * f_max when there is no idle time.
+ *
+ * NOTE: numerical errors or stop class might cause us to not quite hit
+ * saturation when we should -- something for later.
*/
if ((util + cpu_util_dl(rq)) >= max)
return max;

/*
- * As there is still idle time on the CPU, we need to compute the
- * utilization level of the CPU.
+ * There is still idle time; further improve the number by using the
+ * irq metric. Because IRQ/steal time is hidden from the task clock we
+ * need to scale the task numbers:
*
+ * 1 - irq
+ * U' = irq + ------- * U
+ * max
+ */
+ util *= (max - irq);
+ util /= max;
+ util += irq;
+
+ /*
* Bandwidth required by DEADLINE must always be granted while, for
* FAIR and RT, we use blocked utilization of IDLE CPUs as a mechanism
* to gracefully reduce the frequency when no tasks show up for longer
@@ -217,18 +261,7 @@ static unsigned long sugov_get_util(stru
* util_cfs + util_dl as requested freq. However, cpufreq is not yet
* ready for such an interface. So, we only do the latter for now.
*/
-
- /* Weight rqs utilization to normal context window */
- util *= (max - irq);
- util /= max;
-
- /* Add interrupt utilization */
- util += irq;
-
- /* Add DL bandwidth requirement */
- util += sg_cpu->bw_dl;
-
- return min(max, util);
+ return min(max, util + sg_cpu->bw_dl);
}

/**