Re: [PATCH v7 23/23] sched: Fix rt/dl load balancing via chain level balance

From: John Stultz
Date: Thu Jan 04 2024 - 22:42:57 EST


On Fri, Dec 22, 2023 at 6:51 AM Metin Kaya <metin.kaya@xxxxxxx> wrote:
> On 20/12/2023 12:18 am, John Stultz wrote:
> > + /*
> > + * Chain leads off the rq, we're free to push it anywhere.
> > + *
> > + * One wrinkle with relying on find_exec_ctx is that when the chain
> > + * leads to a task currently migrating to rq, we see the chain as
> > + * pushable & push everything prior to the migrating task. Even if
> > + * we checked explicitly for this case, we could still race with a
> > + * migration after the check.
> > + * This shouldn't permanently produce a bad state though, as proxy()
>
> find_proxy_task()

Fixed.

> > +#ifdef CONFIG_SCHED_PROXY_EXEC
> > static inline bool dl_revalidate_rq_state(struct task_struct *task, struct rq *rq,
> > - struct rq *later)
> > + struct rq *later, bool *retry)
> > +{
> > + if (!dl_task(task) || is_migration_disabled(task))
> > + return false;
> > +
> > + if (rq != this_rq()) {
> > + struct task_struct *next_task = pick_next_pushable_dl_task(rq);
> > +
> > + if (next_task == task) {
>
> Nit: We can `return false;` if next_task != task and save one level of
> indentation.

Ah, good point. Fixed.

> > + struct task_struct *exec_ctx;
> > +
> > + exec_ctx = find_exec_ctx(rq, next_task);
> > + *retry = (exec_ctx && !cpumask_test_cpu(later->cpu,
> > + &exec_ctx->cpus_mask));
> > + } else {
> > + return false;
> > + }
> > + } else {
> > + int pushable = task_is_pushable(rq, task, later->cpu);
> > +
> > + *retry = pushable == -1;
> > + if (!pushable)
> > + return false;
>
> `return pushable;` can replace above 2 lines.
> The same for rt_revalidate_rq_state().

Hrm. It does save lines, but I fret (in my estimation) it makes the
code a touch more complex to read. I might hold off on this for the
moment unless someone else pushes for it.

> > diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
> > index fabb19891e95..d5ce95dc5c09 100644
> > --- a/kernel/sched/rt.c
> > +++ b/kernel/sched/rt.c
...
> > +#ifdef CONFIG_SCHED_PROXY_EXEC
> > +static inline bool rt_revalidate_rq_state(struct task_struct *task, struct rq *rq,
> > + struct rq *lowest, bool *retry)
>
> This function can be consolidated with dl_revalidate_rq_state() as you
> noted in the previous patch, although rt_revalidate_rq_state() has few
> comments.

Yeah. I need to stare at it a bit to try to figure out what might be
the best name to use for the common chunk.
I'd also like to figure a better way to do the retry stuff, as it feels messy.

> > +{
> > + /*
> > + * Releasing the rq lock means we need to re-check pushability.
> > + * Some scenarios:
> > + * 1) If a migration from another CPU sent a task/chain to rq
> > + * that made task newly unpushable by completing a chain
> > + * from task to rq->curr, then we need to bail out and push something
> > + * else.
> > + * 2) If our chain led off this CPU or to a dequeued task, the last waiter
> > + * on this CPU might have acquired the lock and woken (or even migrated
> > + * & run, handed off the lock it held, etc...). This can invalidate the
> > + * result of find_lowest_rq() if our chain previously ended in a blocked
> > + * task whose affinity we could ignore, but now ends in an unblocked
> > + * task that can't run on lowest_rq.
> > + * 3) Race described at https://lore.kernel.org/all/1523536384-26781-2-git-send-email-huawei.libin@xxxxxxxxxx/
> > + *
> > + * Notes on these:
> > + * - Scenario #2 is properly handled by rerunning find_lowest_rq
> > + * - Scenario #1 requires that we fail
> > + * - Scenario #3 can AFAICT only occur when rq is not this_rq(). And the
> > + * suggested fix is not universally correct now that push_cpu_stop() can
> > + * call this function.
> > + */
> > + if (!rt_task(task) || is_migration_disabled(task)) {
> > + return false;
> > + } else if (rq != this_rq()) {
>
> Nit: `else` can be dropped as in dl_revalidate_rq_state().

Ack. Done.

Thanks again for all the feedback!
-john