RE: [patch 00/53] x86/topology: The final installment

From: Michael Kelley (LINUX)
Date: Sat Aug 12 2023 - 09:52:16 EST


From: Thomas Gleixner <tglx@xxxxxxxxxxxxx> Sent: Monday, August 7, 2023 6:53 AM
>
> Hi!
>
> This is the (for now) last part of reworking topology enumeration and
> management. It's based on the APIC and CPUID rework series which can be
> found here:
>
> https://lore.kernel.org/lkml/20230802101635.459108805@xxxxxxxxxxxxx/
>
> With these preparatory changes in place, it's now possible to address the
> real issues of the current topology code:
>
> - Wrong core count on hybrid systems
>
> - Heuristics based size information for packages and dies which
> are failing to work correctly with certain command line parameters.
>
> - Full evaluation fail for a theoretical hybrid system which boots
> from an E-core
>
> - The complete insanity of manipulating global data from firmware parsers
> or the XEN/PV fake SMP enumeration. The latter is really a piece of art.
>
> This series addresses this by
>
> - Mopping up some more historical technical debt
>
> - Consolidating all topology relevant functionality into one place
>
> - Providing separate interfaces for boot time and ACPI hotplug operations
>
> - A sane ordering of command line options and restrictions
>
> - A sensible way to handle the BSP problem in kdump kernels instead of
> the unreliable command line option.
>
> - Confinement of topology relevant variables by replacing the XEN/PV SMP
> enumeration fake with something halfways sensible.
>
> - Evaluation of sizes by analysing the topology via the CPUID provided
> APIC ID segmentation and the actual APIC IDs which are registered at
> boot time.
>
> - Removal of heuristics and broken size calculations
>
> The idea behind this is the following:
>
> The APIC IDs describe the system topology in multiple domain levels. The
> CPUID topology parser provides the information which part of the APIC ID is
> associated to the individual levels (Intel terminology):
>
> [ROOT][PACKAGE][DIE][TILE][MODULE][CORE][THREAD]
>
> The root space contains the package (socket) IDs. Not enumerated levels
> consume 0 bits space, but conceptually they are always represented. If
> e.g. only CORE and THREAD levels are enumerated then the DIE, MODULE and
> TILE have the same physical ID as the PACKAGE.
>
> If SMT is not supported, then the THREAD domain is still used. It then
> has the same physical ID as the CORE domain and is the only child of
> the core domain.
>
> This allows an unified view on the system independent of the enumerated
> domain levels without requiring any conditionals in the code.
>
> AMD does only expose 4 domain levels with obviously different terminology,
> but that can be easily mapped into the Intel variant with a trivial lookup
> table added to the CPUID parser.
>
> The resulting topology information of an ADL hybrid system with 8 P-Cores
> and 8 E-Cores looks like this:
>
> CPU topo: Max. logical packages: 1
> CPU topo: Max. logical dies: 1
> CPU topo: Max. dies per package: 1
> CPU topo: Max. threads per core: 2
> CPU topo: Num. cores per package: 16
> CPU topo: Num. threads per package: 24
> CPU topo: Allowing 24 present CPUs plus 0 hotplug CPUs
> CPU topo: Thread : 24
> CPU topo: Core : 16
> CPU topo: Module : 1
> CPU topo: Tile : 1
> CPU topo: Die : 1
> CPU topo: Package : 1
>
> This is happening on the boot CPU before any of the APs is started and
> provides correct size information right from the start.
>
> Even the XEN/PV trainwreck makes use of this now. On Dom0 it utilizes the
> MADT and on DomU it provides fake APIC IDs, which combined with the
> provided CPUID information make it at least look halfways realistic instead
> of claiming to have one CPU per package as the current upstream code does.
>
> This is solely addressing the core topology issues, but there is a plan for
> further consolidation of other topology related information into one single
> source of information instead of having a gazillion of localized special
> parsers and representations all over the place. There are quite some other
> things which can be simplified on top of this, like updating the various
> cpumasks during CPU bringup, but that's all left for later.
>
> So another 53 patches later, the resulting diffstat is:
>
> 64 files changed, 830 insertions(+), 955 deletions(-)
>
> and the combo diffstat of all three series combined:
>
> 115 files changed, 2414 insertions(+), 3035 deletions(-)
>
> The current series applies on top of
>
> git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git topo-cpuid-v3
>
> and is available from git here:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git topo-full-v1
>
> Thanks,
>
> tglx

Tested the full series on Hyper-V VMs on Intel and AMD Zen processors.
Tested with hyper-threading enabled and disabled, and with a variety of
NUMA and L3 cache configurations. All looks good, modulo the known
issue with Hyper-V providing incorrect APIC IDs in some NUMA configs,
but this patch series did not make that problem any worse.

Tested-by: Michael Kelley <mikelley@xxxxxxxxxxxxx>