Re: Re: [PATCH 1/5] sched/fair: ignore SIS_UTIL when has idle core

From: Abel Wu
Date: Wed Sep 07 2022 - 03:52:49 EST


On 9/6/22 5:57 PM, Mel Gorman wrote:
On Mon, Sep 05, 2022 at 10:40:00PM +0800, Abel Wu wrote:
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 6089251a4720..59b27a2ef465 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6427,21 +6427,36 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, bool
if (sd_share) {
/* because !--nr is the condition to stop scan */
nr = READ_ONCE(sd_share->nr_idle_scan) + 1;
- /* overloaded LLC is unlikely to have idle cpu/core */
- if (nr == 1)
- return -1;
+
+ /*
+ * Non-overloaded case: Scan full domain if there is
+ * an idle core. Otherwise, scan for an idle
+ * CPU based on nr_idle_scan
+ * Overloaded case: Unlikely to have an idle CPU but
+ * conduct a limited scan if there is potentially
+ * an idle core.
+ */
+ if (nr > 1) {
+ if (has_idle_core)
+ nr = sd->span_weight;
+ } else {
+ if (!has_idle_core)
+ return -1;
+ nr = 2;
+ }
}
}
for_each_cpu_wrap(cpu, cpus, target + 1) {
+ if (!--nr)
+ break;
+
if (has_idle_core) {
i = select_idle_core(p, cpu, cpus, &idle_cpu);
if ((unsigned int)i < nr_cpumask_bits)
return i;
} else {
- if (!--nr)
- return -1;
idle_cpu = __select_idle_cpu(cpu, p);
if ((unsigned int)idle_cpu < nr_cpumask_bits)
break;

I spent last few days testing this, with 3 variations (assume
has_idle_core):

a) full or limited (2cores) scan when !nr_idle_scan
b) whether clear sds->has_idle_core when partial scan failed
c) scale scan depth with load or not

some observations:

1) It seems always bad if not clear sds->has_idle_core when
partial scan fails. It is due to over partially scanned
but still can not find an idle core. (Following ones are
based on clearing has_idle_core even in partial scans.)


Ok, that's rational. There will be corner cases where there was no idle
CPU near the target when there is an idle core far away but it would be
counter to the purpose of SIS_UTIL to care about that corner case.

Yes, and this corner case (that may become normal sometimes actually)
will be continuously exist if scanning in a linear fashion while scan
depth is limited (SIS_UTIL/SIS_PROP/...), especially when the LLC is
getting larger nowadays.


2) Unconditionally full scan when has_idle_core is not good
for netperf_{udp,tcp} and tbench4. It is probably because
the SIS success rate of these workloads is already high
enough (netperf ~= 100%, tbench4 ~= 50%, compared to that
hackbench ~= 3.5%) which negate a lot of the benefit full
scan brings.


That's also rational. For a single client/server on netperf, it's expected
that the SIS success rate is high and scanning is minimal. As the client
and server are sharing data on localhost and somewhat synchronous, it may
even partially benefit from SMT sharing.

So basic approach would be "always clear sds->has_idle_core" + "limit
scan even when has_idle_core is true", right?

Yes, exactly the same as what you suggested at first place.


If so, stick a changelog on it and resend!

I will include this in the SIS_FILTER patchset.


3) Scaling scan depth with load seems good for the hackbench
socket tests, and neutral in pipe tests. And I think this
is just the case you mentioned before, under fast wake-up
workloads the has_idle_core will become not that reliable,
so a full scan won't always win.


My recommendation is leave this out for now but assuming the rest of
the patches get picked up, consider posting it for the next major kernel
version (i.e. separate the basic and clever approaches by one major kernel
version). By separating them, there is less chance of a false positive
bisection pointing to the wrong patch. Any regression will not be perfectly
reproducible so the changes of a false positive bisection is relatively high.

Makes sense. I will just include the basic part first.

Thanks,
Abel