Re: [RT] BUG in sched/cpupri.c

From: John Keeping
Date: Fri Jan 07 2022 - 06:49:55 EST


On Fri, Jan 07, 2022 at 11:46:45AM +0100, Dietmar Eggemann wrote:
> On 22/12/2021 20:48, Valentin Schneider wrote:
> > /*
> > diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
> > index ef8228d19382..8f3e3a1367b6 100644
> > --- a/kernel/sched/rt.c
> > +++ b/kernel/sched/rt.c
> > @@ -1890,6 +1890,16 @@ static int push_rt_task(struct rq *rq, bool pull)
> > if (!next_task)
> > return 0;
> >
> > + /*
> > + * It's possible that the next_task slipped in of higher priority than
> > + * current, or current has *just* changed priority. If that's the case
> > + * just reschedule current.
> > + */
> > + if (unlikely(next_task->prio < rq->curr->prio)) {
> > + resched_curr(rq);
> > + return 0;
> > + }
>
> IMHO, that's the bit which prevents the BUG.
>
> But this would also prevent the case in which rq->curr is an RT task
> with lower prio than next_task.
>
> Also `rq->curr = migration/X` goes still though which is somehow fine
> since find_lowest_rq() bails out for if (task->nr_cpus_allowed == 1).
>
> And DL tasks (like sugov:X go through and they can have
> task->nr_cpus_allowed > 1 (arm64 slow-switching boards with shared
> freuency domains with schedutil). cpupri_find_fitness()->convert_prio()
> can handle task_pri, p->prio = -1 (CPUPRI_INVALID) although its somehow
> by coincidence.
>
> So maybe something like this:

Do you mean to replace just the one hunk from Valentin's patch with the
change below (keeping the rest), or are you saying that only the change
below is needed?

> @ -1898,6 +1898,11 @@ static int push_rt_task(struct rq *rq, bool pull)
> if (!pull || rq->push_busy)
> return 0;
>
> + if (rq->curr->sched_class != &rt_sched_class) {
> + resched_curr(rq);
> + return 0;
> + }
> +
> cpu = find_lowest_rq(rq->curr);
>
> [...]