[RFC] sched: unused cpu in affine workload

From: Jiri Olsa
Date: Mon Apr 04 2016 - 04:23:11 EST


hi,
we've noticed following issue in one of our workloads.

I have 24 CPUs server with following sched domains:
domain 0: (pairs)
domain 1: 0-5,12-17 (group1) 6-11,18-23 (group2)
domain 2: 0-23 level NUMA

I run CPU hogging workload on following CPUs:
4,6,14,18,19,20,23

that is:
4,14 CPUs from group1
6,18,19,20,23 CPUs from group2

the workload process gets affinity setup via 'taskset -c ${CPUs workload ...'
and forks child for every CPU

very often we notice CPUs 4 and 14 running 3 processes of the workload
while CPUs 6,18,19,20,23 running just 4 processes, leaving one of the
CPU from group2 idle

AFAICS from the code the reason for this is that the load balancing
follows domains setup (topology) and does not regard affinity setups
like this. The code in find_busiest_group running under idle cpu from
group2 will find group1 as bussiest, but its average load will be
smaller than the one on the local group, so there's no task pulling.

It's obvious, that load balancer follows sched domain topology.
However is there some sched feature I'm missing that could help
with this? Or do we need to follow sched domains topology when
we select CPUs for workload to get even balancing?

thanks,
jirka