Re: [patch 00/53] x86/topology: The final installment

From: Juergen Gross
Date: Tue Aug 08 2023 - 12:53:05 EST


On 07.08.23 15:52, Thomas Gleixner wrote:
Hi!

This is the (for now) last part of reworking topology enumeration and
management. It's based on the APIC and CPUID rework series which can be
found here:

https://lore.kernel.org/lkml/20230802101635.459108805@xxxxxxxxxxxxx

With these preparatory changes in place, it's now possible to address the
real issues of the current topology code:

- Wrong core count on hybrid systems

- Heuristics based size information for packages and dies which
are failing to work correctly with certain command line parameters.

- Full evaluation fail for a theoretical hybrid system which boots
from an E-core

- The complete insanity of manipulating global data from firmware parsers
or the XEN/PV fake SMP enumeration. The latter is really a piece of art.

This series addresses this by

- Mopping up some more historical technical debt

- Consolidating all topology relevant functionality into one place

- Providing separate interfaces for boot time and ACPI hotplug operations

- A sane ordering of command line options and restrictions

- A sensible way to handle the BSP problem in kdump kernels instead of
the unreliable command line option.

- Confinement of topology relevant variables by replacing the XEN/PV SMP
enumeration fake with something halfways sensible.

- Evaluation of sizes by analysing the topology via the CPUID provided
APIC ID segmentation and the actual APIC IDs which are registered at
boot time.

- Removal of heuristics and broken size calculations

The idea behind this is the following:

The APIC IDs describe the system topology in multiple domain levels. The
CPUID topology parser provides the information which part of the APIC ID is
associated to the individual levels (Intel terminology):

[ROOT][PACKAGE][DIE][TILE][MODULE][CORE][THREAD]

The root space contains the package (socket) IDs. Not enumerated levels
consume 0 bits space, but conceptually they are always represented. If
e.g. only CORE and THREAD levels are enumerated then the DIE, MODULE and
TILE have the same physical ID as the PACKAGE.

If SMT is not supported, then the THREAD domain is still used. It then
has the same physical ID as the CORE domain and is the only child of
the core domain.

This allows an unified view on the system independent of the enumerated
domain levels without requiring any conditionals in the code.

AMD does only expose 4 domain levels with obviously different terminology,
but that can be easily mapped into the Intel variant with a trivial lookup
table added to the CPUID parser.

The resulting topology information of an ADL hybrid system with 8 P-Cores
and 8 E-Cores looks like this:

CPU topo: Max. logical packages: 1
CPU topo: Max. logical dies: 1
CPU topo: Max. dies per package: 1
CPU topo: Max. threads per core: 2
CPU topo: Num. cores per package: 16
CPU topo: Num. threads per package: 24
CPU topo: Allowing 24 present CPUs plus 0 hotplug CPUs
CPU topo: Thread : 24
CPU topo: Core : 16
CPU topo: Module : 1
CPU topo: Tile : 1
CPU topo: Die : 1
CPU topo: Package : 1

This is happening on the boot CPU before any of the APs is started and
provides correct size information right from the start.

Even the XEN/PV trainwreck makes use of this now. On Dom0 it utilizes the
MADT and on DomU it provides fake APIC IDs, which combined with the
provided CPUID information make it at least look halfways realistic instead
of claiming to have one CPU per package as the current upstream code does.

This is solely addressing the core topology issues, but there is a plan for
further consolidation of other topology related information into one single
source of information instead of having a gazillion of localized special
parsers and representations all over the place. There are quite some other
things which can be simplified on top of this, like updating the various
cpumasks during CPU bringup, but that's all left for later.

So another 53 patches later, the resulting diffstat is:

64 files changed, 830 insertions(+), 955 deletions(-)

and the combo diffstat of all three series combined:

115 files changed, 2414 insertions(+), 3035 deletions(-)

The current series applies on top of

git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git topo-cpuid-v3

and is available from git here:

git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git topo-full-v1

Tested on an Intel system with Xen:

- PV dom0 is working fine. I couldn't test physical cpu hotplug, but removing
and then re-adding vcpus to dom0 worked.

- PV domU is working fine, too. A test with starting using 2 vcpus initially
and onlining another 2 vcpus later was doing fine.

So for Xen PV you can add my:

Tested-by: Juergen Gross <jgross@xxxxxxxx>

One other thing to mention: with this series the reported topology via "lscpu"
and "cat /proc/cpuinfo" inside a PV guest/dom0 is looking sane for the first
time. :-)

Thanks for this significant improvement!


Juergen

Attachment: OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key

Attachment: OpenPGP_signature.asc
Description: OpenPGP digital signature