Re: [PATCH 1/1] sched/rt: avoid contend with CFS task

From: Dietmar Eggemann
Date: Thu Oct 03 2019 - 12:26:36 EST


[+ Steven Rostedt <rostedt@xxxxxxxxxxx>]

On 29/08/2019 05:15, Jing-Ting Wu wrote:
> At original linux design, RT & CFS scheduler are independent.
> Current RT task placement policy will select the first cpu in
> lowest_mask, even if the first CPU is running a CFS task.
> This may put RT task to a running cpu and let CFS task runnable.
>
> So we select idle cpu in lowest_mask first to avoid preempting
> CFS task.
>
> Signed-off-by: Jing-Ting Wu <jing-ting.wu@xxxxxxxxxxxx>
> ---
> kernel/sched/rt.c | 42 +++++++++++++++++-------------------------
> 1 file changed, 17 insertions(+), 25 deletions(-)
>
> diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
> index a532558..626ca27 100644
> --- a/kernel/sched/rt.c
> +++ b/kernel/sched/rt.c
> @@ -1388,7 +1388,6 @@ static void yield_task_rt(struct rq *rq)
> static int
> select_task_rq_rt(struct task_struct *p, int cpu, int sd_flag, int flags)
> {
> - struct task_struct *curr;
> struct rq *rq;
>
> /* For anything but wake ups, just return the task_cpu */
> @@ -1398,33 +1397,15 @@ static void yield_task_rt(struct rq *rq)
> rq = cpu_rq(cpu);
>
> rcu_read_lock();
> - curr = READ_ONCE(rq->curr); /* unlocked access */
>
> /*
> - * If the current task on @p's runqueue is an RT task, then
> - * try to see if we can wake this RT task up on another
> - * runqueue. Otherwise simply start this RT task
> - * on its current runqueue.
> - *
> - * We want to avoid overloading runqueues. If the woken
> - * task is a higher priority, then it will stay on this CPU
> - * and the lower prio task should be moved to another CPU.
> - * Even though this will probably make the lower prio task
> - * lose its cache, we do not want to bounce a higher task
> - * around just because it gave up its CPU, perhaps for a
> - * lock?
> - *
> - * For equal prio tasks, we just let the scheduler sort it out.
> - *
> - * Otherwise, just let it ride on the affined RQ and the
> - * post-schedule router will push the preempted task away
> - *
> - * This test is optimistic, if we get it wrong the load-balancer
> - * will have to sort it out.
> + * If the task p is allowed to put more than one CPU or
> + * it is not allowed to put on this CPU.
> + * Let p use find_lowest_rq to choose other idle CPU first,
> + * instead of choose this cpu and preempt curr cfs task.
> */
> - if (curr && unlikely(rt_task(curr)) &&
> - (curr->nr_cpus_allowed < 2 ||
> - curr->prio <= p->prio)) {
> + if ((p->nr_cpus_allowed > 1) ||
> + (!cpumask_test_cpu(cpu, p->cpus_ptr))) {
> int target = find_lowest_rq(p);

I'm sure RT folks don't like the idea to change this condition.

I remember a similar approach and Steven Rostedt NAKed the idea back:

https://lore.kernel.org/r/1415099585-31174-2-git-send-email-pang.xunlei@xxxxxxxxxx

Back then, Xunlei Pang even tried to create a lower mask of idle CPUs,
for find_lower_mask() to return:

https://lore.kernel.org/r/1415099585-31174-1-git-send-email-pang.xunlei@xxxxxxxxxx

[...]