Re: [PATCH -v2 12/18] sched/fair: Rewrite PELT migration propagation

From: Peter Zijlstra
Date: Fri Oct 13 2017 - 16:41:41 EST


On Fri, Oct 13, 2017 at 05:22:54PM +0200, Vincent Guittot wrote:
>
> I have studied a bit more how to improve the propagation formula and the
> changes below is doing the job for the UCs that I have tested.
>
> Unlike running, we can't directly propagate the runnable through hierarchy
> when we migrate a task. Instead we must ensure that we will not
> over/underestimate the impact of the migration thanks to several rules:
> - ge->avg.runnable_sum can't be higher than LOAD_AVG_MAX
> - ge->avg.runnable_sum can't be lower than ge->avg.running_sum (once scaled to
> the same range)
> - we can't directly propagate a negative delta of runnable_sum because part of
> this runnable time can be "shared" with others sched_entities and stays on the
> gcfs_rq.

Right, that's about how far I got.

> - ge->avg.runnable_sum can't increase when we detach a task.

Yeah, that would be fairly broken.

> Instead, we can't estimate the new runnable_sum of the gcfs_rq with

s/can't/can/ ?

> the formula:
>
> gcfs_rq's runnable sum = gcfs_rq's load_sum / gcfs_rq's weight.

That might be the best we can do.. its wrong, but then its less wrong
that what we have now. The comments can be much improved though. Not to
mention that the big comment on top needs a little help.

> ---
> kernel/sched/fair.c | 56 ++++++++++++++++++++++++++++++++++++++++++-----------
> 1 file changed, 45 insertions(+), 11 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 350dbec0..a063b048 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -3489,33 +3489,67 @@ update_tg_cfs_util(struct cfs_rq *cfs_rq, struct sched_entity *se, struct cfs_rq
> static inline void
> update_tg_cfs_runnable(struct cfs_rq *cfs_rq, struct sched_entity *se, struct cfs_rq *gcfs_rq)
> {
> + long running_sum, runnable_sum = gcfs_rq->prop_runnable_sum;
> + long runnable_load_avg, delta_avg, load_avg;
> + s64 runnable_load_sum, delta_sum, load_sum = 0;
>
> if (!runnable_sum)
> return;
>
> gcfs_rq->prop_runnable_sum = 0;
>
> + /*
> + * Get a rough estimate of gcfs_rq's runnable
> + * This is a low guess as it assumes that tasks are equally
> + * runnable which is not true but we can't really do better
> + */
> + if (scale_load_down(gcfs_rq->load.weight)) {
> + load_sum = div_s64(gcfs_rq->avg.load_sum,
> + scale_load_down(gcfs_rq->load.weight));
}
> +
> + /*
> + * Propating a delta of runnable is not just adding it to ge's
> + * runnable_sum:
> + * - Adding a delta runnable can't make ge's runnable_sum higher than
> + * LOAD_AVG_MAX
> + * - We can't directly remove a delta of runnable from
> + * ge's runnable_sum but we can only guest estimate what runnable
> + * will become thanks to few simple rules:
> + * - gcfs_rq's runnable is a good estimate
> + * - ge's runnable_sum can't increase when we remove runnable
> + * - runnable_sum can't be lower than running_sum
> + */
> + if (runnable_sum >= 0) {
> + runnable_sum += se->avg.load_sum;
> + runnable_sum = min(runnable_sum, LOAD_AVG_MAX);
> + } else {
> + runnable_sum = min(se->avg.load_sum, load_sum);
}
> +
> + running_sum = se->avg.util_sum >> SCHED_CAPACITY_SHIFT;
> + runnable_sum = max(runnable_sum, running_sum);
> +
> load_sum = (s64)se_weight(se) * runnable_sum;
> load_avg = div_s64(load_sum, LOAD_AVG_MAX);
>
> + delta_sum = load_sum - (s64)se_weight(se) * se->avg.load_sum;
> + delta_avg = load_avg - se->avg.load_avg;
>
> + se->avg.load_sum = runnable_sum;
> + se->avg.load_avg = load_avg;
> + add_positive(&cfs_rq->avg.load_avg, delta_avg);
> + add_positive(&cfs_rq->avg.load_sum, delta_sum);
>
> runnable_load_sum = (s64)se_runnable(se) * runnable_sum;
> runnable_load_avg = div_s64(runnable_load_sum, LOAD_AVG_MAX);
> + delta_sum = runnable_load_sum - se_weight(se) * se->avg.runnable_load_sum;
> + delta_avg = runnable_load_avg - se->avg.runnable_load_avg;
>
> + se->avg.runnable_load_sum = runnable_sum;
> + se->avg.runnable_load_avg = runnable_load_avg;
>
> if (se->on_rq) {
> + add_positive(&cfs_rq->avg.runnable_load_avg, delta_avg);
> + add_positive(&cfs_rq->avg.runnable_load_sum, delta_sum);
> }
> }
>
> --
> 2.7.4
>