Re: [RFC 1/2] sched/fair: Fix load_balance() affinity redo path

From: Jeffrey Hugo
Date: Thu May 18 2017 - 10:32:03 EST


On 5/15/2017 8:56 AM, Dietmar Eggemann wrote:
On 12/05/17 21:57, Jeffrey Hugo wrote:
On 5/12/2017 2:47 PM, Peter Zijlstra wrote:
On Fri, May 12, 2017 at 11:01:37AM -0600, Jeffrey Hugo wrote:
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index d711093..8f783ba 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -8219,8 +8219,19 @@ static int load_balance(int this_cpu, struct
rq *this_rq,
/* All tasks on this runqueue were pinned by CPU affinity */
if (unlikely(env.flags & LBF_ALL_PINNED)) {
+ struct cpumask tmp;

You cannot have cpumask's on stack.

Well, we need a temp variable to store the intermediate values since the
cpumask_* operations are somewhat limited, and require a "storage"
parameter.

Do you have any suggestions to meet all of these requirements?

What about we use env.dst_grpmask and check if cpus is an improper
subset of env.dst_grpmask? In this case we have to get rid of
setting env.dst_grpmask = NULL in case of CPU_NEWLY_IDLE which is
IMHO not an issue since it's idle is passed via env into
can_migrate_task().
And cpus has to be and'ed with sched_domain_span(env.sd).

I'm not sure if this will work with 'not fully connected NUMA' (SD_OVERLAP)
though ...

Hmm. I follow the idea, but I'm not too confident in the SD_OVERLAP case, and looking at your proposed code, it seems invasive to me - changes are needed in what would otherwise be unrelated sections of code. I'd prefer not to go in that direction. Also, it appears that the dst_cpu is still considered as a source for load.

We've got a different idea to address the stack issue, and still keep the change "contained", which I'll roll into a V2 today or tomorrow.

--
Jeffrey Hugo
Qualcomm Datacenter Technologies as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the
Code Aurora Forum, a Linux Foundation Collaborative Project.