Re: [PATCH] arm64: smp: Skip MC domain for SoCs without shared cache

From: Darren Hart
Date: Wed Feb 23 2022 - 11:39:38 EST


On Wed, Feb 23, 2022 at 09:19:17AM +0100, Vincent Guittot wrote:

...

> > > AFAICT, this CLUSTER level is only supported by ACPI. In
> > > parse_acpi_topology() you should be able to know if cluster level is
> > > above or below the level returned by acpi_find_last_cache_level() and
> > > set the correct topology table accordingly
> > >
> >
> > Thanks Vincent,
> >
> > This made sense as a place to start to me. The more I dug into the ACPI PPTT
> > code, I kept running into conflicts with the API which would require extending
> > it in ways that seems contrary to its intent. e.g. the exposed API uses Kernel
> > logical CPUs rather than the topology ids (needed for working with processor
> > containers). The cpu_topology masks haven't been populated yet, and
> > acpi_find_last_cache_level is decoupled from the CPU topology level. So what
> > we're really testing for is if the cluster cpumask is a subset of the coregroup
> > cpumask or not, and it made the most sense to me to keep that in smp.c after the
> > cpumasks have been updated and stored.
>
> I'm not sure why you want to compare cpumask when you can directly
> compare topology level which is exactly what we are looking for in
> order to correctly order the scheduler topology. I was expecting
> something like the below to be enough. acpi_find_cluster_level() needs
> to be created and should be similar to
> find_acpi_cpu_topology_cluster() but return level instead of id. The
> main advantage is that everything is contained in topology.c which
> makes sense as we are playing with topology

Hi Vincent,

This was my thinking as well before I dug into the acpi pptt interfaces.

The cpu topology levels and the cache levels are independent and assuming I've
not misunderstood the implementation, acpi_find_cache_level() returns the
highest *cache* level described in the PPTT for a given CPU.

For the Ampere Altra, for example:

CPU Topo 1 System
Level 2 Package
| 3 Cluster
| 4 Processor --- L1 I Cache \____ L2 U Cache -x
\/ --- L1 D Cache /

4 Processor --- L1 I Cache \____ L2 U Cache -x
--- L1 D Cache /

Cache Level ----> 1 2

acpi_find_cache_level() returns "2" for any logical cpu, but this doesn't tell
us anything about the CPU topology level across which this cache is shared.

This is what drove me out of topology.c and up into smp.c after the various
topologies are populated and comparing masks. I'll spend a bit more time before
sending a cpumask implementation to see if there is a better point to do this
where the cpu topology level with shared cache is more readily available.

>
> diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
> index 9ab78ad826e2..4dac0491b7e3 100644
> --- a/arch/arm64/kernel/topology.c
> +++ b/arch/arm64/kernel/topology.c
> @@ -84,6 +84,7 @@ static bool __init acpi_cpu_is_threaded(int cpu)
> int __init parse_acpi_topology(void)
> {
> int cpu, topology_id;
> + bool default_cluster_topology = true;
>
> if (acpi_disabled)
> return 0;
> @@ -119,8 +120,16 @@ int __init parse_acpi_topology(void)
> if (cache_id > 0)
> cpu_topology[cpu].llc_id = cache_id;
> }
> +
> + if (default_cluster_topology &&
> + (i < acpi_find_cluster_level(cpu))) {

Per above, from what I understand, this is comparing cpu topology levels with
cache levels, which are independent from each other.

> + default_cluster_topology = false;
> + }
> }
>
> + if (!default_cluster_topology)
> + set_sched_topology(arm64_no_mc_topology);
> +
> return 0;
> }

Thanks,

--
Darren Hart
Ampere Computing / OS and Kernel