Re: [PATCH v2] sched/pelt: fix update_blocked_averages() for dl and rt

From: Peter Zijlstra
Date: Fri Aug 31 2018 - 11:11:01 EST


On Fri, Aug 31, 2018 at 05:07:21PM +0200, Peter Zijlstra wrote:
> On Fri, Aug 31, 2018 at 04:56:19PM +0200, Vincent Guittot wrote:
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 309c93f..bc1de21 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -7262,6 +7262,7 @@ static void update_blocked_averages(int cpu)
> > {
> > struct rq *rq = cpu_rq(cpu);
> > struct cfs_rq *cfs_rq, *pos;
> > + const struct sched_class *curr_class = rq->curr->sched_class;
> > struct rq_flags rf;
> > bool done = true;
>
> Can you do me a v3 where you move that rq->curr dereference under the
> rq->lock?
>
> I _think_ it is actually OK, but it is really dodgy. Moving it under
> rq->lock makes it obvious correct.

Ah, it is not correct. I only checked to see if preemption was disabled
and if we're calling this on the local CPU (which I think is true),
which would guarantee rq->curr's existence.

But that is not sufficient to make rq->curr->sched_class stable.

> > @@ -7298,8 +7299,8 @@ static void update_blocked_averages(int cpu)
> > if (cfs_rq_has_blocked(cfs_rq))
> > done = false;
> > }
> > - update_rt_rq_load_avg(rq_clock_task(rq), rq, 0);
> > - update_dl_rq_load_avg(rq_clock_task(rq), rq, 0);
> > + update_rt_rq_load_avg(rq_clock_task(rq), rq, curr_class == &rt_sched_class);
> > + update_dl_rq_load_avg(rq_clock_task(rq), rq, curr_class == &dl_sched_class);
> > update_irq_load_avg(rq, 0);
> > /* Don't need periodic decay once load/util_avg are null */
> > if (others_have_blocked(rq))