Re: [PATCH 2/3] sched/fair: Introduce scaled capacity awareness in select_idle_sibling code path

From: Rohit Jain
Date: Thu Sep 28 2017 - 11:10:03 EST


Hi Joel,

On 09/28/2017 05:53 AM, joelaf wrote:
Hi Rohit,

On Tue, Sep 26, 2017 at 12:48 PM, Rohit Jain <rohit.k.jain@xxxxxxxxxx> wrote:
[...]

<snip>
}

- if (idle)
- return core;
+ if (idle) {
+ if (rcpu == -1)
+ return (rcpu_backup != -1 ? rcpu_backup :
core);
+ return rcpu;
+ }

This didn't make much sense to me, here you are returning either an
SMT thread or a core. That doesn't make much of a difference because
SMT threads share the same capacity (SD_SHARE_CPUCAPACITY). I think
what you want to do is find out the capacity of a 'core', not an SMT
thread, and compare the capacity of different cores and consider the
one which has least RT/IRQ interference.

IIUC the capacities of each strand is scaled by IRQ and 'rt_avg' for that
'rq'. Now if the strand is idle now and gets an interrupt in the future,
the 'core' would look like:

+----+----+
| I | |
| T | |
+----+----+

(I -> Interrupt, T-> Thread we are trying to schedule).

whereas if the other strand on the core was taking interrupt the core
would look like:

+----+----+
| I | T |
| | |
+----+----+

With this case, because we know from the past avg, one of the strands is
running low on capacity, I am trying to return a better strand for the
thread to start on.

I know what you're trying to do but they way you've retrofitted it into the
core looks weird (to me) and makes the code unreadable and ugly IMO.

Why not do something simpler like skip the core if any SMT thread has been
running at lesser capacity? I'm not sure if this works great or if the maintainers
will prefer your or my below approach, but I find the below diff much cleaner
for the select_idle_core bit. It also makes more sense since resources are
shared at SMT level so makes sense to me to skip the core altogether for this:

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 6ee7242dbe0a..f324a84e29f1 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5738,14 +5738,17 @@ static int select_idle_core(struct task_struct *p, struct sched_domain *sd, int
for_each_cpu_wrap(core, cpus, target) {
bool idle = true;
+ bool full_cap = true;
for_each_cpu(cpu, cpu_smt_mask(core)) {
cpumask_clear_cpu(cpu, cpus);
if (!idle_cpu(cpu))
idle = false;
+ if (!full_capacity(cpu))
+ full_cap = false;
}
- if (idle)
+ if (idle && full_cap)
return core;
}


Well, with your changes you will skip over fully idle cores which is not
an ideal thing either. I see that you were advocating for select
idle+lowest capacity core, whereas I was stopping at the first idlecore.

Since the whole philosophy till now in this patch is "Don't spare an
idle CPU", I think the following diff might look better to you. Please
note this is only for discussion sakes, I haven't fully tested it yet.

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index ec15e5f..c2933eb 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6040,7 +6040,9 @@ void __update_idle_core(struct rq *rq)
Âstatic int select_idle_core(struct task_struct *p, struct sched_domain *sd, int target)
Â{
ÂÂÂÂ struct cpumask *cpus = this_cpu_cpumask_var_ptr(select_idle_mask);
-ÂÂÂ int core, cpu;
+ÂÂÂ int core, cpu, rcpu, backup_core;
+
+ÂÂÂ rcpu = backup_core = -1;

ÂÂÂÂ if (!static_branch_likely(&sched_smt_present))
ÂÂÂÂ ÂÂÂ return -1;
@@ -6052,15 +6054,34 @@ static int select_idle_core(struct task_struct *p, struct sched_domain *sd, int

ÂÂÂÂ for_each_cpu_wrap(core, cpus, target) {
ÂÂÂÂ ÂÂÂ bool idle = true;
+ÂÂÂ ÂÂÂ bool full_cap = true;

ÂÂÂÂ ÂÂÂ for_each_cpu(cpu, cpu_smt_mask(core)) {
ÂÂÂÂ ÂÂÂ ÂÂÂ cpumask_clear_cpu(cpu, cpus);
ÂÂÂÂ ÂÂÂ ÂÂÂ if (!idle_cpu(cpu))
ÂÂÂÂ ÂÂÂ ÂÂÂ ÂÂÂ idle = false;
+
+ÂÂÂ ÂÂÂ ÂÂÂ if (!full_capacity(cpu)) {
+ÂÂÂ ÂÂÂ ÂÂÂ ÂÂÂ full_cap = false;
+ÂÂÂ ÂÂÂ ÂÂÂ }
ÂÂÂÂ ÂÂÂ }

-ÂÂÂ ÂÂÂ if (idle)
+ÂÂÂ ÂÂÂ if (idle && full_cap)
ÂÂÂÂ ÂÂÂ ÂÂÂ return core;
+ÂÂÂ ÂÂÂ else if (idle && backup_core == -1)
+ÂÂÂ ÂÂÂ ÂÂÂ backup_core = core;
+ÂÂÂ }
+
+ÂÂÂ if (backup_core != -1) {
+ÂÂÂ ÂÂÂ for_each_cpu(cpu, cpu_smt_mask(backup_core)) {
+ÂÂÂ ÂÂÂ ÂÂÂ if (full_capacity(cpu))
+ÂÂÂ ÂÂÂ ÂÂÂ ÂÂÂ return cpu;
+ÂÂÂ ÂÂÂ ÂÂÂ else if ((rcpu == -1) ||
+ÂÂÂ ÂÂÂ ÂÂÂ ÂÂÂ Â(capacity_of(cpu) > capacity_of(rcpu)))
+ÂÂÂ ÂÂÂ ÂÂÂ ÂÂÂ rcpu = cpu;
+ÂÂÂ ÂÂÂ }
+
+ÂÂÂ ÂÂÂ return rcpu;
ÂÂÂÂ }


Do let me know what you think.

Thanks,
Rohit


thanks,

- Joel