Re: [RESEND PATCH] sched/fair: consider RT/IRQ pressure in select_idle_sibling

From: Rohit Jain
Date: Tue Jan 30 2018 - 14:43:32 EST


Hi Joel,

On 1/29/2018 7:39 PM, Joel Fernandes wrote:
Hi Rohit,

On Mon, Jan 29, 2018 at 3:27 PM, Rohit Jain<rohit.k.jain@xxxxxxxxxx> wrote:
Currently fast path in the scheduler looks for an idle CPU to schedule
threads on. Capacity is taken into account in the function
'select_task_rq_fair' when it calls 'wake_cap', however it ignores the
instantaneous capacity and looks at the original capacity. Furthermore
select_idle_sibling path of the code, ignores the RT/IRQ threads which
are also running on the CPUs it is looking to schedule fair threads on.

We don't necessarily have to force the code to go to slow path (by
modifying wake_cap), instead we could do a better selection of the CPU
in the current domain itself (i.e. in the fast path).

This patch makes the fast path aware of capacity, resulting in overall
performance improvements as shown in the test results.

[...]
I also ran uperf and sysbench MySQL workloads but I see no statistically
significant change.

Signed-off-by: Rohit Jain<rohit.k.jain@xxxxxxxxxx>
---
kernel/sched/fair.c | 38 ++++++++++++++++++++++++++++----------
1 file changed, 28 insertions(+), 10 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 26a71eb..ce5ccf8 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5625,6 +5625,11 @@ static unsigned long capacity_orig_of(int cpu)
return cpu_rq(cpu)->cpu_capacity_orig;
}

+static inline bool full_capacity(int cpu)
+{
+ return capacity_of(cpu) >= (capacity_orig_of(cpu)*3)/4;
+}
+
static unsigned long cpu_avg_load_per_task(int cpu)
{
struct rq *rq = cpu_rq(cpu);
@@ -6081,7 +6086,7 @@ static int select_idle_core(struct task_struct *p, struct sched_domain *sd, int

for_each_cpu(cpu, cpu_smt_mask(core)) {
cpumask_clear_cpu(cpu, cpus);
- if (!idle_cpu(cpu))
+ if (!idle_cpu(cpu) || !full_capacity(cpu))
idle = false;
}
There's some difference in logic between select_idle_core and
select_idle_cpu as far as the full_capacity stuff you're adding goes.
In select_idle_core, if all CPUs are !full_capacity, you're returning
-1. But in select_idle_cpu you're returning the best idle CPU that's
the most cap among the !full_capacity ones. Why there is this
different in logic? Did I miss something?

This is the previous discussion on this same code. I measured the
performance difference and saw no statistically significant impact,
hence went with your suggestion of simpler code.

https://lkml.org/lkml/2017/10/3/1001


@@ -6102,7 +6107,8 @@ static int select_idle_core(struct task_struct *p, struct sched_domain *sd, int
*/
static int select_idle_smt(struct task_struct *p, struct sched_domain *sd, int target)
{
- int cpu;
+ int cpu, rcpu = -1;
+ unsigned long max_cap = 0;

if (!static_branch_likely(&sched_smt_present))
return -1;
@@ -6110,11 +6116,13 @@ static int select_idle_smt(struct task_struct *p, struct sched_domain *sd, int t
for_each_cpu(cpu, cpu_smt_mask(target)) {
if (!cpumask_test_cpu(cpu, &p->cpus_allowed))
continue;
- if (idle_cpu(cpu))
- return cpu;
+ if (idle_cpu(cpu) && (capacity_of(cpu) > max_cap)) {
+ max_cap = capacity_of(cpu);
+ rcpu = cpu;
At the SMT level, do you need to bother with choosing best capacity
among threads? If RT is eating into one of the SMT thread's underlying
capacity, it would eat into the other's. Wondering what's the benefit
of doing this here.

Yes, you are right because of SD_SHARE_CPUCAPACITY, however the benefit
is that if don't do this check, we might end up picking a SMT thread
which has "high" RT/IRQ activity and be on the run queue for a while,
till the pull side can bail us out.

Thanks,
Rohit

thanks,

- Joel