Re: [PATCH 1/2] sched/fair: account update_blocked_averages in newidle_balance cost

From: Peter Zijlstra
Date: Tue Oct 05 2021 - 16:42:13 EST


On Mon, Oct 04, 2021 at 07:14:50PM +0200, Vincent Guittot wrote:
> The time spent to update the blocked load can be significant depending of
> the complexity fo the cgroup hierarchy. Take this time into account when
> deciding to stop newidle_balance() because it exceeds the expected idle
> time.
>
> Signed-off-by: Vincent Guittot <vincent.guittot@xxxxxxxxxx>
> ---
> kernel/sched/fair.c | 7 +++++--
> 1 file changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 8943dbb94365..1f78b2e3b71c 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -10810,7 +10810,7 @@ static int newidle_balance(struct rq *this_rq, struct rq_flags *rf)
> int this_cpu = this_rq->cpu;
> struct sched_domain *sd;
> int pulled_task = 0;
> - u64 curr_cost = 0;
> + u64 t0, domain_cost, curr_cost = 0;
>
> update_misfit_status(NULL, this_rq);
>
> @@ -10855,11 +10855,14 @@ static int newidle_balance(struct rq *this_rq, struct rq_flags *rf)
>
> raw_spin_rq_unlock(this_rq);
>
> + t0 = sched_clock_cpu(this_cpu);
> update_blocked_averages(this_cpu);
> + domain_cost = sched_clock_cpu(this_cpu) - t0;
> + curr_cost += domain_cost;
> +
> rcu_read_lock();
> for_each_domain(this_cpu, sd) {
> int continue_balancing = 1;
> - u64 t0, domain_cost;
>
> if (this_rq->avg_idle < curr_cost + sd->max_newidle_lb_cost) {
> update_next_balance(sd, &next_balance);

Does this make sense? It avoids a bunch of clock calls (and thereby
accounts more actual time).

Also, perhaps we should some asymmetric IIR instead of a strict MAX
filter for max_newidle_lb_cost.

---
Index: linux-2.6/kernel/sched/fair.c
===================================================================
--- linux-2.6.orig/kernel/sched/fair.c
+++ linux-2.6/kernel/sched/fair.c
@@ -10759,9 +10759,9 @@ static int newidle_balance(struct rq *th
{
unsigned long next_balance = jiffies + HZ;
int this_cpu = this_rq->cpu;
+ u64 t0, t1, curr_cost = 0;
struct sched_domain *sd;
int pulled_task = 0;
- u64 t0, domain_cost, curr_cost = 0;

update_misfit_status(NULL, this_rq);

@@ -10808,8 +10808,9 @@ static int newidle_balance(struct rq *th

t0 = sched_clock_cpu(this_cpu);
update_blocked_averages(this_cpu);
- domain_cost = sched_clock_cpu(this_cpu) - t0;
- curr_cost += domain_cost;
+ t1 = sched_clock_cpu(this_cpu);
+ curr_cost += t1 - t0;
+ t0 = t1;

rcu_read_lock();
for_each_domain(this_cpu, sd) {
@@ -10821,17 +10822,19 @@ static int newidle_balance(struct rq *th
}

if (sd->flags & SD_BALANCE_NEWIDLE) {
- t0 = sched_clock_cpu(this_cpu);
+ u64 domain_cost;

pulled_task = load_balance(this_cpu, this_rq,
sd, CPU_NEWLY_IDLE,
&continue_balancing);

- domain_cost = sched_clock_cpu(this_cpu) - t0;
+ t1 = sched_clock_cpu(this_cpu);
+ domain_cost = t1 - t0;
if (domain_cost > sd->max_newidle_lb_cost)
sd->max_newidle_lb_cost = domain_cost;

curr_cost += domain_cost;
+ t0 = t1;
}

update_next_balance(sd, &next_balance);