Re: Re: [PATCH 1/5] sched/fair: ignore SIS_UTIL when has idle core

From: Abel Wu
Date: Mon Sep 05 2022 - 10:43:09 EST


On 9/2/22 6:25 PM, Mel Gorman Wrote:
For the simple case, I was expecting the static depth to *not* match load
because it's unclear what the scaling should be for load or if it had a
benefit. If investigating scaling the scan depth to load, it would still
make sense to compare it to a static depth. The depth of 2 cores was to
partially match the old SIS_PROP behaviour of the minimum depth to scan.

if (span_avg > 4*avg_cost)
nr = div_u64(span_avg, avg_cost);
else
nr = 4;

nr is not proportional to cores although it could be
https://lore.kernel.org/all/20210726102247.21437-7-mgorman@xxxxxxxxxxxxxxxxxxx/

This is not tested or properly checked for correctness but for
illustrative purposes something like this should conduct a limited scan when
overloaded. It has a side-effect that the has_idle_cores hint gets cleared
for a partial scan for idle cores but the hint is probably wrong anyway.

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 6089251a4720..59b27a2ef465 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6427,21 +6427,36 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, bool
if (sd_share) {
/* because !--nr is the condition to stop scan */
nr = READ_ONCE(sd_share->nr_idle_scan) + 1;
- /* overloaded LLC is unlikely to have idle cpu/core */
- if (nr == 1)
- return -1;
+
+ /*
+ * Non-overloaded case: Scan full domain if there is
+ * an idle core. Otherwise, scan for an idle
+ * CPU based on nr_idle_scan
+ * Overloaded case: Unlikely to have an idle CPU but
+ * conduct a limited scan if there is potentially
+ * an idle core.
+ */
+ if (nr > 1) {
+ if (has_idle_core)
+ nr = sd->span_weight;
+ } else {
+ if (!has_idle_core)
+ return -1;
+ nr = 2;
+ }
}
}
for_each_cpu_wrap(cpu, cpus, target + 1) {
+ if (!--nr)
+ break;
+
if (has_idle_core) {
i = select_idle_core(p, cpu, cpus, &idle_cpu);
if ((unsigned int)i < nr_cpumask_bits)
return i;
} else {
- if (!--nr)
- return -1;
idle_cpu = __select_idle_cpu(cpu, p);
if ((unsigned int)idle_cpu < nr_cpumask_bits)
break;

I spent last few days testing this, with 3 variations (assume
has_idle_core):

a) full or limited (2cores) scan when !nr_idle_scan
b) whether clear sds->has_idle_core when partial scan failed
c) scale scan depth with load or not

some observations:

1) It seems always bad if not clear sds->has_idle_core when
partial scan fails. It is due to over partially scanned
but still can not find an idle core. (Following ones are
based on clearing has_idle_core even in partial scans.)

2) Unconditionally full scan when has_idle_core is not good
for netperf_{udp,tcp} and tbench4. It is probably because
the SIS success rate of these workloads is already high
enough (netperf ~= 100%, tbench4 ~= 50%, compared to that
hackbench ~= 3.5%) which negate a lot of the benefit full
scan brings.

3) Scaling scan depth with load seems good for the hackbench
socket tests, and neutral in pipe tests. And I think this
is just the case you mentioned before, under fast wake-up
workloads the has_idle_core will become not that reliable,
so a full scan won't always win.

Best Regards,
Abel