Re: [PATCH 1/2] sched/fair: skip newidle update stats

From: Vincent Guittot
Date: Fri Nov 12 2021 - 09:47:37 EST


On Fri, 12 Nov 2021 at 15:29, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
>
> On Fri, Nov 12, 2021 at 10:58:56AM +0100, Vincent Guittot wrote:
> > In case we skip the newly idle LB entirely or we abort it because we are
> > going to exceed the avg_idle, we have to make sure to not start an update
> > of the blocked load when entering idle
> >
> > Signed-off-by: Vincent Guittot <vincent.guittot@xxxxxxxxxx>
> > ---
> > kernel/sched/fair.c | 18 ++++++++++++++----
> > 1 file changed, 14 insertions(+), 4 deletions(-)
> >
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 13950beb01a2..a162b0ec8963 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -10861,7 +10861,7 @@ static int newidle_balance(struct rq *this_rq, struct rq_flags *rf)
> > int this_cpu = this_rq->cpu;
> > u64 t0, t1, curr_cost = 0;
> > struct sched_domain *sd;
> > - int pulled_task = 0;
> > + int pulled_task = 0, early_stop = 0;
> >
> > update_misfit_status(NULL, this_rq);
> >
> > @@ -10898,8 +10898,16 @@ static int newidle_balance(struct rq *this_rq, struct rq_flags *rf)
> > if (!READ_ONCE(this_rq->rd->overload) ||
> > (sd && this_rq->avg_idle < sd->max_newidle_lb_cost)) {
> >
> > - if (sd)
> > + if (sd) {
> > update_next_balance(sd, &next_balance);
> > +
> > + /*
> > + * We skip new idle LB because there is not enough
> > + * time before next wake up. Make sure that we will
> > + * not kick NOHZ_NEWILB_KICK
> > + */
> > + early_stop = 1;
> > + }
> > rcu_read_unlock();
> >
> > goto out;
> > @@ -10918,8 +10926,10 @@ static int newidle_balance(struct rq *this_rq, struct rq_flags *rf)
> >
> > update_next_balance(sd, &next_balance);
> >
> > - if (this_rq->avg_idle < curr_cost + sd->max_newidle_lb_cost)
> > + if (this_rq->avg_idle < curr_cost + sd->max_newidle_lb_cost) {
> > + early_stop = 1;
> > break;
> > + }
> >
> > if (sd->flags & SD_BALANCE_NEWIDLE) {
> >
> > @@ -10969,7 +10979,7 @@ static int newidle_balance(struct rq *this_rq, struct rq_flags *rf)
> >
> > if (pulled_task)
> > this_rq->idle_stamp = 0;
> > - else
> > + else if (!early_stop)
> > nohz_newidle_balance(this_rq);
> >
> > rq_repin_lock(this_rq, rf);
>
> Urgh code flow is a mess... Let me see if I can fix some of that.

yeah, I haven't find a better way

>
> Anyway, does nohz_newidle_balance() want to loose it's ->avg_idle test
> with this on?

This test still covers cases with short newly idle balance. Being
conservative, people never complained that the update of blocked load
average of idle CPUs doesn't happen often enough. It's most often the
opposite