Re: [PATCH RESEND 0/3] Represent cluster topology and enable load balance between clusters

From: Barry Song
Date: Sat Oct 02 2021 - 03:10:28 EST


On Sat, Oct 2, 2021 at 12:22 PM Tim Chen <tim.c.chen@xxxxxxxxxxxxxxx> wrote:
>
> On Fri, 2021-10-01 at 16:57 +0200, Peter Zijlstra wrote:
> > On Fri, Oct 01, 2021 at 12:39:56PM +0200, Vincent Guittot wrote:
> > > Hi Barry,
> > >
> > > On Fri, 1 Oct 2021 at 12:32, Barry Song <21cnbao@xxxxxxxxx> wrote:
> > > > Hi Vincent, Dietmar, Peter, Ingo,
> > > > Do you have any comment on this first series which exposes
> > > > cluster topology
> > > > of ARM64 kunpeng 920 & x86 Jacobsville and supports load balance
> > > > only for
> > > > the 1st stage?
> > > > I will be very grateful for your comments so that things can move
> > > > forward in the
> > > > right direction. I think Tim also looks forward to bringing up
> > > > cluster
> > > > support in
> > > > Jacobsville.
> > >
> > > This patchset makes sense to me and the addition of a new
> > > scheduling
> > > level to better reflect the HW topology goes in the right
> > > direction.
> >
> > So I had a look, dreading the selecti-idle-sibling changes, and was
> > pleasantly surprised they're gone :-)

Thanks, Peter and Vincent for reviewing.

My tiny scheduler team is still hardly working on the
select-idle-sibling changes.
And that one will be sent as a separate series as an improvement to this series.
I promise the wake-affine series won't be that scary when you see it
next time :-)

> >
> > As is, this does indeed look like something mergable without too much
> > hassle.
> >
> > The one questino I have is, do we want default y?
>
> I also agree that default y is preferable.

Thanks, Tim, for your comments.
I am ok to make it default "Y" for x86 after having a better doc as below:
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index bd27b1cdac34..940eb1fe0abb 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1002,12 +1002,17 @@ config NR_CPUS
to the kernel image.

config SCHED_CLUSTER
- bool "Cluster scheduler support"
- default n
+ def_bool y
+ prompt "Cluster scheduler support"
help
Cluster scheduler support improves the CPU scheduler's decision
- making when dealing with machines that have clusters of CPUs
- sharing L2 cache. If unsure say N here.
+ making when dealing with machines that have clusters of CPUs.
+ Cluster usually means a couple of CPUs which are placed closely
+ by sharing mid-level caches, last-level cache tags or internal
+ busses. For example, on x86 Jacobsville, each 4 CPUs share one
+ L2 cache. This feature isn't a universal win because it can bring
+ a cost of slightly increased overhead in some places. If unsure
+ say N here.

This also aligns well with SCHED_MC and SCHED_SMT in arch/x86/kconfig:
config SCHED_MC
def_bool y
prompt "Multi-core scheduler support"

config SCHED_SMT
def_bool y if SMP

But ARM64 is running in a different tradition, arch/arm64/Kconfig has
SCHED_MC and SCHED_SMT as below:
config SCHED_MC
bool "Multi-core scheduler support"
help
...

config SCHED_SMT
bool "SMT scheduler support"
help
...

I don't want to be an odd man :-) So for ARM64, I vote keeping the
Kconfig file as is. And I am planning to modify arch/arm64/defconfig
in second patchset(select-idle-sibling) by adding
CONFIG_SCHED_CLUSTR=y
as load-balance plus wake-affine changes seem to make cluster
scheduler much more widely win on kunpeng920 while doing load-
balance only can sometimes hurt. so I don't mind holding "N" for
a while on the ARM64 platform.

>
> >
> > The one nit I have is the Kconfig text, I'm not really sure that's
> > clarifying what a cluster is.
>
> Do you have a preference of a different name other than cluster?
> Or simply better documentation on what a cluster is for ARM64
> and x86 in Kconfig?

Anyway, naming is really a hard thing. cluster seems not a bad name for
ARM SoCs as besides kunpeng, some other ARM SoCs are also using this
name in specifications, for example, neoverse-n1, phytium etc.

Will we use the same name between x86 and ARM and just refine the document
as below? Does the below doc explain what is "cluster" better?

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 7e4651a1aaf4..86821e83b935 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -993,8 +993,13 @@ config SCHED_CLUSTER
bool "Cluster scheduler support"
help
Cluster scheduler support improves the CPU scheduler's decision
- making when dealing with machines that have clusters(sharing internal
- bus or sharing LLC cache tag). If unsure say N here.
+ making when dealing with machines that have clusters of CPUs.
+ Cluster usually means a couple of CPUs which are placed closely
+ by sharing mid-level caches, last-level cache tags or internal
+ busses. For example, on Hisilicon Kunpeng920, each 4 CPUs share
+ LLC cache tags. This feature isn't a universal win because it
+ can bring a cost of slightly increased overhead in some places.
+ If unsure say N here.

config SCHED_SMT
bool "SMT scheduler support"
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index bd27b1cdac34..940eb1fe0abb 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1002,12 +1002,17 @@ config NR_CPUS
to the kernel image.

config SCHED_CLUSTER
- bool "Cluster scheduler support"
- default n
+ def_bool y
+ prompt "Cluster scheduler support"
help
Cluster scheduler support improves the CPU scheduler's decision
- making when dealing with machines that have clusters of CPUs
- sharing L2 cache. If unsure say N here.
+ making when dealing with machines that have clusters of CPUs.
+ Cluster usually means a couple of CPUs which are placed closely
+ by sharing mid-level caches, last-level cache tags or internal
+ busses. For example, on x86 Jacobsville, each 4 CPUs share one
+ L2 cache. This feature isn't a universal win because it can bring
+ a cost of slightly increased overhead in some places. If unsure
+ say N here.

config SCHED_SMT
def_bool y if SMP


>
> Thanks.
>
> Tim
>

Thanks
barry