Re: [PATCH 4/5] sched/rt: Consider deadline tasks in cpupri_find()

From: Peter Zijlstra
Date: Tue Jan 27 2015 - 07:59:11 EST


On Mon, Jan 19, 2015 at 04:49:39AM +0000, Xunlei Pang wrote:
> Currently, RT global scheduling doesn't factor deadline
> tasks, this may cause some problems.
>
> See a case below:
> On a 3 CPU system, CPU0 has one running deadline task,
> CPU1 has one running low priority RT task or idle, CPU3
> has one running high priority RT task. When another mid
> priority RT task is woken on CPU3, it will be pushed to
> CPU0(this also disturbs the deadline task on CPU0), while
> it is reasonable to put it on CPU1.
>
> This patch eliminates this issue by filtering CPUs that
> have runnable deadline tasks, using cpudl->free_cpus in
> cpupri_find().

Not a bad idea, Cc'ed Steve who likes to look after the RT bits,
excessive quoting for him.

> NOTE: We want to make the most use of percpu local_cpu_mask
> to save an extra mask allocation, so always passing a non-NULL
> lowest_mask to cpupri_find().
>
> Signed-off-by: Xunlei Pang <pang.xunlei@xxxxxxxxxx>
> ---
> kernel/sched/core.c | 2 ++
> kernel/sched/cpupri.c | 22 +++++-----------------
> kernel/sched/cpupri.h | 1 +
> kernel/sched/rt.c | 9 +++++----
> 4 files changed, 13 insertions(+), 21 deletions(-)
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index ade2958..48c9576 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -5652,6 +5652,8 @@ static int init_rootdomain(struct root_domain *rd)
>
> if (cpupri_init(&rd->cpupri) != 0)
> goto free_rto_mask;
> +
> + rd->cpupri.cpudl = &rd->cpudl;

This is disgusting though; it breaks the cpuri abstraction. Why not pass
in the mask in the one place you actually need it?

> return 0;
>
> free_rto_mask:
> diff --git a/kernel/sched/cpupri.c b/kernel/sched/cpupri.c
> index 981fcd7..40b8e81 100644
> --- a/kernel/sched/cpupri.c
> +++ b/kernel/sched/cpupri.c
> @@ -32,6 +32,7 @@
> #include <linux/sched/rt.h>
> #include <linux/slab.h>
> #include "cpupri.h"
> +#include "cpudeadline.h"
>
> /* Convert between a 140 based task->prio, and our 102 based cpupri */
> static int convert_prio(int prio)
> @@ -54,7 +55,7 @@ static int convert_prio(int prio)
> * cpupri_find - find the best (lowest-pri) CPU in the system
> * @cp: The cpupri context
> * @p: The task
> - * @lowest_mask: A mask to fill in with selected CPUs (or NULL)
> + * @lowest_mask: A mask to fill in with selected CPUs (not NULL)
> *
> * Note: This function returns the recommended CPUs as calculated during the
> * current invocation. By the time the call returns, the CPUs may have in
> @@ -103,24 +104,11 @@ int cpupri_find(struct cpupri *cp, struct task_struct *p,
> if (skip)
> continue;
>
> - if (cpumask_any_and(&p->cpus_allowed, vec->mask) >= nr_cpu_ids)
> + cpumask_and(lowest_mask, &p->cpus_allowed, vec->mask);
> + cpumask_and(lowest_mask, lowest_mask, cp->cpudl->free_cpus);
> + if (cpumask_any(lowest_mask) >= nr_cpu_ids)
> continue;
>
> - if (lowest_mask) {
> - cpumask_and(lowest_mask, &p->cpus_allowed, vec->mask);
> -
> - /*
> - * We have to ensure that we have at least one bit
> - * still set in the array, since the map could have
> - * been concurrently emptied between the first and
> - * second reads of vec->mask. If we hit this
> - * condition, simply act as though we never hit this
> - * priority level and continue on.
> - */
> - if (cpumask_any(lowest_mask) >= nr_cpu_ids)
> - continue;
> - }
> -
> return 1;
> }
>
> diff --git a/kernel/sched/cpupri.h b/kernel/sched/cpupri.h
> index 63cbb9c..acd7ccf 100644
> --- a/kernel/sched/cpupri.h
> +++ b/kernel/sched/cpupri.h
> @@ -18,6 +18,7 @@ struct cpupri_vec {
> struct cpupri {
> struct cpupri_vec pri_to_cpu[CPUPRI_NR_PRIORITIES];
> int *cpu_to_pri;
> + struct cpudl *cpudl;
> };
>
> #ifdef CONFIG_SMP
> diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
> index 6725e3c..d28cfa4 100644
> --- a/kernel/sched/rt.c
> +++ b/kernel/sched/rt.c
> @@ -1349,14 +1349,17 @@ out:
> return cpu;
> }
>
> +static DEFINE_PER_CPU(cpumask_var_t, local_cpu_mask);
> static void check_preempt_equal_prio(struct rq *rq, struct task_struct *p)
> {
> + struct cpumask *lowest_mask = this_cpu_cpumask_var_ptr(local_cpu_mask);
> +
> /*
> * Current can't be migrated, useless to reschedule,
> * let's hope p can move out.
> */
> if (rq->curr->nr_cpus_allowed == 1 ||
> - !cpupri_find(&rq->rd->cpupri, rq->curr, NULL))
> + !cpupri_find(&rq->rd->cpupri, rq->curr, lowest_mask))
> return;
>
> /*


Again; should you not put something useful in the mask before you pass
it to cpupri_find()?

> @@ -1364,7 +1367,7 @@ static void check_preempt_equal_prio(struct rq *rq, struct task_struct *p)
> * see if it is pushed or pulled somewhere else.
> */
> if (p->nr_cpus_allowed != 1
> - && cpupri_find(&rq->rd->cpupri, p, NULL))
> + && cpupri_find(&rq->rd->cpupri, p, lowest_mask))
> return;
>
> /*
> @@ -1526,8 +1529,6 @@ static struct task_struct *pick_highest_pushable_task(struct rq *rq, int cpu)
> return NULL;
> }
>
> -static DEFINE_PER_CPU(cpumask_var_t, local_cpu_mask);
> -
> static int find_lowest_rq(struct task_struct *task)
> {
> struct sched_domain *sd;
> --
> 1.9.1
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/