[PATCH v3 2/3] sched/core: Don't mix isolcpus and housekeeping CPUs

From: Srikar Dronamraju
Date: Thu Oct 25 2018 - 14:42:45 EST


Load balancer and NUMA balancer are not suppose to work on isolcpus.

Currently when setting cpus_allowed for a task, there are no checks to see
if the requested cpumask has CPUs from both isolcpus and housekeeping CPUs.

If user passes a mix of isolcpus and housekeeping CPUs, then NUMA balancer
can pick a isolcpu to schedule. With this change, if a combination of
isolcpus and housekeeping CPUs are provided, then we restrict it to
housekeeping CPUs only.

For example: System with 32 CPUs
$ grep -o "isolcpus=[,,1-9]*" /proc/cmdline
isolcpus=1,5,9,13
$ grep -i cpus_allowed /proc/$$/status
Cpus_allowed: ffffdddd
Cpus_allowed_list: 0,2-4,6-8,10-12,14-31

Running "perf bench numa mem --no-data_rand_walk -p 4 -t 8 -G 0 -P 3072
-T 0 -l 50 -c -s 1000" which calls sched_setaffinity to all CPUs in
system.

Without patch
------------
$ for i in $(pgrep -f perf); do grep -i cpus_allowed_list /proc/$i/task/*/status ; done | head -n 10
Cpus_allowed_list: 0,2-4,6-8,10-12,14-31
/proc/2107/task/2107/status:Cpus_allowed_list: 0-31
/proc/2107/task/2196/status:Cpus_allowed_list: 0-31
/proc/2107/task/2197/status:Cpus_allowed_list: 0-31
/proc/2107/task/2198/status:Cpus_allowed_list: 0-31
/proc/2107/task/2199/status:Cpus_allowed_list: 0-31
/proc/2107/task/2200/status:Cpus_allowed_list: 0-31
/proc/2107/task/2201/status:Cpus_allowed_list: 0-31
/proc/2107/task/2202/status:Cpus_allowed_list: 0-31
/proc/2107/task/2203/status:Cpus_allowed_list: 0-31

With patch
----------
$ for i in $(pgrep -f perf); do grep -i cpus_allowed_list /proc/$i/task/*/status ; done | head -n 10
Cpus_allowed_list: 0,2-4,6-8,10-12,14-31
/proc/18591/task/18591/status:Cpus_allowed_list: 0,2-4,6-8,10-12,14-31
/proc/18591/task/18603/status:Cpus_allowed_list: 0,2-4,6-8,10-12,14-31
/proc/18591/task/18604/status:Cpus_allowed_list: 0,2-4,6-8,10-12,14-31
/proc/18591/task/18605/status:Cpus_allowed_list: 0,2-4,6-8,10-12,14-31
/proc/18591/task/18606/status:Cpus_allowed_list: 0,2-4,6-8,10-12,14-31
/proc/18591/task/18607/status:Cpus_allowed_list: 0,2-4,6-8,10-12,14-31
/proc/18591/task/18608/status:Cpus_allowed_list: 0,2-4,6-8,10-12,14-31
/proc/18591/task/18609/status:Cpus_allowed_list: 0,2-4,6-8,10-12,14-31
/proc/18591/task/18610/status:Cpus_allowed_list: 0,2-4,6-8,10-12,14-31

Signed-off-by: Srikar Dronamraju <srikar@xxxxxxxxxxxxxxxxxx>
---
Changelog v2->v3:
The actual detection is moved to set_cpus_allowed_common from
sched_setaffinity. This helps to solve all cases where task cpus_allowed is
set.

kernel/sched/core.c | 14 +++++++++++++-
1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 3064e0f..37e62b8 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1003,7 +1003,19 @@ static int migration_cpu_stop(void *data)
*/
void set_cpus_allowed_common(struct task_struct *p, const struct cpumask *new_mask)
{
- cpumask_copy(&p->cpus_allowed, new_mask);
+ const struct cpumask *hk_mask = housekeeping_cpumask(HK_FLAG_DOMAIN);
+
+ /*
+ * If the cpumask provided has CPUs that are part of isolated and
+ * housekeeping_cpumask, then restrict it to just the CPUs that
+ * are part of the housekeeping_cpumask.
+ */
+ if (!cpumask_subset(new_mask, hk_mask) &&
+ cpumask_intersects(new_mask, hk_mask))
+ cpumask_and(&p->cpus_allowed, new_mask, hk_mask);
+ else
+ cpumask_copy(&p->cpus_allowed, new_mask);
+
p->nr_cpus_allowed = cpumask_weight(new_mask);
}

--
1.8.3.1