Re: [PATCH] sched/fair: favor non-idle group in tick preemption

From: Josh Don
Date: Fri Nov 11 2022 - 14:15:14 EST


On Thu, Nov 10, 2022 at 7:50 PM Abel Wu <wuyun.abel@xxxxxxxxxxxxx> wrote:
>
> > By the way, I'm curious if you modified any of the sched_idle_cpu()
> > and related load balancing around idle entities given that you've made
> > it so that idle entities can have arbitrary weight (since, as I
> > described in my prior email, this can otherwise cause issues there).
>
> Being able to change idle entities' weight can bring nothing but
> convenience, because it can also be achieved by modifying all their
> siblings' weight. Which seems not a strong reason to get merged.
>
> And I'm also thinking that, although rare, a non-idle group can also
> have a weight close or even equal to 3. I guess some users who made
> this kind of setting might only want to benefit from the preemption
> at wakeup? Nevertheless this setting is supported now :)

Strongly disagree with this; part of the semantics for idle relies on
the minimum weight value. It is true that this behavior gets a little
weirder if siblings also have close to min weight, but this is an
artifact of the fact that SCHED_IDLE is built into CFS rather than
being a separate scheduling class. The minimum weight in general is
assumed by load balance, etc. for the purpose of placing non-idle
entities.

For consistency with the per-task SCHED_IDLE behavior, which
effectively makes idle tasks have the max nice value, we also need to
match the cgroup idle behavior.

I think the use case you're describing would honestly be better served
by extending SCHED_BATCH with the preemption properties and using
that, rather than try to overload SCHED_IDLE here. There's a
difference between non-interactive entities that are ok getting
aggressively preempted, vs "idle" entities that should really only be
soaking the remaining cycles on the machine.

Best,
Josh