Re: [PATCH 14/31] sched_ext: Implement BPF extensible scheduler class

From: Peter Zijlstra
Date: Tue Dec 13 2022 - 06:04:54 EST


On Mon, Dec 12, 2022 at 11:33:12AM -1000, Tejun Heo wrote:

> > > @@ -5800,10 +5812,13 @@ static void put_prev_task_balance(struct rq *rq, struct task_struct *prev,
> > > * We can terminate the balance pass as soon as we know there is
> > > * a runnable task of @class priority or higher.
> > > */
> > > - for_class_range(class, prev->sched_class, &idle_sched_class) {
> > > + for_balance_class_range(class, prev->sched_class, &idle_sched_class) {
> > > if (class->balance(rq, prev, rf))
> > > break;
> > > }
> > > +#else
> > > + /* SCX needs the balance call even in UP, call it explicitly */
> >
> > This, *WHY* !?!
>
> This comes from the fact that there are no strict rq boundaries and the BPF
> scheduler can share scheduling queues across any subset of CPUs however they
> see fit. So, task <-> rq association is flexible until the very last moment
> the task gets picked for execution. For the dispatch path to support this,
> it needs to be able to migrate tasks across rq's which requires unlocking
> the current rq which can only be done in ->balance(). So, sched_ext uses
> ->balance() to run the dispatch path and thus needs it called even on UP.

Fundamentally none of that makes sense, on UP there is no placement,
there is only the one CPU, very little choice to be had.

> Given that UP doesn't need to transfer tasks across, it might be possible to
> move the whole dispatch operation into ->pick_next_task() but the current
> state would be different, so it's more complicated and will likely be more
> brittle.

That sounds like something is amiss, you fundamentally hold all the
right locks, there is only one.