Re: [External] Re: [PATCH] sched/fair: favor non-idle group in tick preemption

From: Hao Jia
Date: Thu Nov 03 2022 - 23:49:30 EST




On 2022/11/2 Josh Don wrote:
Some weirdness about this change though, is that if there is a
non-idle current entity, and the two next entities on the cfs_rq are
idle and non-idle respectively, we'll now take longer to preempt the
on-cpu non-idle entity, because the non-idle entity on the cfs_rq is
'hidden' by the idle 'first' entity. Wakeup preemption is different
because we're always directly comparing the current entity with the
newly woken entity.

You are right, this can happen with high probability.
This patch just compared the curr with the first entity in
the tick, and it seems hard to consider all the other entity
in cfs_rq.

So, what specific negative effects this situation would cause?
For example, the "hidden" non-idle entity's latency will be worse
than before?

As Abel points out in his email, it can push out the time it'll take
to switch to the other non-idle entity. The change might boost some
benchmarks numbers, but I don't think it is conclusive enough to say
it is a generically beneficial improvement that should be integrated.

By the way, I'm curious if you modified any of the sched_idle_cpu()
and related load balancing around idle entities given that you've made
it so that idle entities can have arbitrary weight (since, as I
described in my prior email, this can otherwise cause issues there).

If we want to make it easier for non-idle tasks to preempt idle tasks in tick, maybe we can consider lowering sysctl_sched_idle_min_granularity. Of course this may not ensure that non-idle tasks successfully preempt idle tasks every time, but it seems to be more beneficial for non-idle tasks.

IMHO, even if it is allowed to increase the weight of non-idle, it seems that we can make it easier for non-idle tasks to preempt idle tasks by lowering sysctl_sched_idle_min_granularity.

Thanks,
Hao