Re: Potential problem with 31e77c93e432dec7 ("sched/fair: Update blocked load when newly idle")

From: Niklas Söderlund
Date: Thu Apr 26 2018 - 11:39:12 EST


Hi Vincent,

On 2018-04-26 17:27:24 +0200, Vincent Guittot wrote:
> Hi Niklas,
>
> >> Thanks for the trace, I have been able to catch a problem with it.
> >> Could you test the patch below to confirm that the problem is solved ?
> >> The patch apply on-top of
> >> c18bb396d3d261eb ("Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net")
> >
> > I can confirm that with the patch bellow I can no longer produce the
> > problem. Thanks!
>
> Thanks for testing
> Do you mind if I add
> Tested-by: Niklas Söderlund <niklas.soderlund@xxxxxxxxxxxx>

Please do.

>
> Peter, Ingo,
> Do you want me to re-send the patch with all tags or you will take
> this version ?
>
> Regards,
> Vincent
>
> >
> >>
> >> From: Vincent Guittot <vincent.guittot@xxxxxxxxxx>
> >> Date: Thu, 26 Apr 2018 12:19:32 +0200
> >> Subject: [PATCH] sched/fair: fix the update of blocked load when newly idle
> >> MIME-Version: 1.0
> >> Content-Type: text/plain; charset=UTF-8
> >> Content-Transfer-Encoding: 8bit
> >>
> >> With commit 31e77c93e432 ("sched/fair: Update blocked load when newly idle"),
> >> we release the rq->lock when updating blocked load of idle CPUs. This open
> >> a time window during which another CPU can add a task to this CPU's cfs_rq.
> >> The check for newly added task of idle_balance() is not in the common path.
> >> Move the out label to include this check.
> >>
> >> Fixes: 31e77c93e432 ("sched/fair: Update blocked load when newly idle")
> >> Reported-by: Heiner Kallweit <hkallweit1@xxxxxxxxx>
> >> Reported-by: Niklas Söderlund <niklas.soderlund@xxxxxxxxxxxx>
> >> Signed-off-by: Vincent Guittot <vincent.guittot@xxxxxxxxxx>
> >> ---
> >> kernel/sched/fair.c | 2 +-
> >> 1 file changed, 1 insertion(+), 1 deletion(-)
> >>
> >> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> >> index 0951d1c..15a9f5e 100644
> >> --- a/kernel/sched/fair.c
> >> +++ b/kernel/sched/fair.c
> >> @@ -9847,6 +9847,7 @@ static int idle_balance(struct rq *this_rq, struct rq_flags *rf)
> >> if (curr_cost > this_rq->max_idle_balance_cost)
> >> this_rq->max_idle_balance_cost = curr_cost;
> >>
> >> +out:
> >> /*
> >> * While browsing the domains, we released the rq lock, a task could
> >> * have been enqueued in the meantime. Since we're not going idle,
> >> @@ -9855,7 +9856,6 @@ static int idle_balance(struct rq *this_rq, struct rq_flags *rf)
> >> if (this_rq->cfs.h_nr_running && !pulled_task)
> >> pulled_task = 1;
> >>
> >> -out:
> >> /* Move the next balance forward */
> >> if (time_after(this_rq->next_balance, next_balance))
> >> this_rq->next_balance = next_balance;
> >> --
> >> 2.7.4
> >>
> >>
> >>
> >> [snip]
> >>
> >
> > --
> > Regards,
> > Niklas Söderlund

--
Regards,
Niklas Söderlund