Re: [RFC PATCH 3/4] sched/topology: remove smt_gain

From: Qais Yousef
Date: Wed Aug 29 2018 - 10:08:43 EST


On 29/08/18 14:19, Vincent Guittot wrote:
smt_gain is used to compute the capacity of CPUs of a SMT core with the
constraint 1 < smt_gain < 2 in order to be able to compute number of CPUs
per core. The field has_free_capacity of struct numa_stat, which was the
last user of this computation of number of CPUs per core, has been removed
by :

commit 2d4056fafa19 ("sched/numa: Remove numa_has_capacity()")

We can now remove this constraint on core capacity and use the defautl value
SCHED_CAPACITY_SCALE for SMT CPUs. With this remove, SCHED_CAPACITY_SCALE
becomes the maximum compute capacity of CPUs on every systems. This should
help to simplify some code and remove fields like rd->max_cpu_capacity

Furthermore, arch_scale_cpu_capacity() is used with a NULL sd in several other
places in the code when it wants the capacity of a CPUs to scale
some metrics like in pelt, deadline or schedutil. In case on SMT, the value
returned is not the capacity of SMT CPUs but default SCHED_CAPACITY_SCALE.

Remove the smt_gain field from sched_domain struct

You beat me to it, I got confused by smt_gain recently when I stumbled on it as I found out on ARM it's not used and had to spend sometime to convince myself it's not really necessary to use it.

It was hard to track the history of this and *why* it's needed.

The only 'theoretical' case I found smt_gain can be useful is when you have asymmetric system, for example:

Node_A: 1 Core 2 Threads
Node_B: 1 Core 4 Threads

Then with smt_gain the group_capacity at the core level will be limited to smt_gain. But without smt_gain Node_B will look twice as powerful as Node_A - which will affect balancing AFAICT causing Node_B's single core to be oversubscribed as the 4 threads will still have to share the same underlying hardware resources. I don't think in practice such systems exists, or even make sense, though.

So +1 from my side for the removal.

Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
Cc: Ingo Molnar <mingo@xxxxxxxxxx>
Cc: linux-kernel@xxxxxxxxxxxxxxx (open list)
Signed-off-by: Vincent Guittot <vincent.guittot@xxxxxxxxxx>
---
include/linux/sched/topology.h | 1 -
kernel/sched/sched.h | 3 ---
kernel/sched/topology.c | 2 --
3 files changed, 6 deletions(-)

diff --git a/include/linux/sched/topology.h b/include/linux/sched/topology.h
index 2634774..212792b 100644
--- a/include/linux/sched/topology.h
+++ b/include/linux/sched/topology.h
@@ -89,7 +89,6 @@ struct sched_domain {
unsigned int newidle_idx;
unsigned int wake_idx;
unsigned int forkexec_idx;
- unsigned int smt_gain;
int nohz_idle; /* NOHZ IDLE status */
int flags; /* See SD_* */
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 4a2e8ca..b1715b8 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1758,9 +1758,6 @@ unsigned long arch_scale_freq_capacity(int cpu)
static __always_inline
unsigned long arch_scale_cpu_capacity(struct sched_domain *sd, int cpu)
{
- if (sd && (sd->flags & SD_SHARE_CPUCAPACITY) && (sd->span_weight > 1))
- return sd->smt_gain / sd->span_weight;
-
return SCHED_CAPACITY_SCALE;
}
#endif
diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index 56a0fed..069c924 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -1129,7 +1129,6 @@ sd_init(struct sched_domain_topology_level *tl,
.last_balance = jiffies,
.balance_interval = sd_weight,
- .smt_gain = 0,
.max_newidle_lb_cost = 0,
.next_decay_max_lb_cost = jiffies,
.child = child,
@@ -1155,7 +1154,6 @@ sd_init(struct sched_domain_topology_level *tl,
if (sd->flags & SD_SHARE_CPUCAPACITY) {
sd->flags |= SD_PREFER_SIBLING;
sd->imbalance_pct = 110;
- sd->smt_gain = 1178; /* ~15% */
} else if (sd->flags & SD_SHARE_PKG_RESOURCES) {
sd->flags |= SD_PREFER_SIBLING;


--
Qais Yousef