Re: [Patch v2 2/2] x86/tsc: use logical_packages as a better estimation of socket numbers

From: Feng Tang
Date: Sun Jun 25 2023 - 10:59:22 EST


On Fri, Jun 23, 2023 at 11:04:34PM +0800, Feng Tang wrote:
> Hi Thomas,
>
> On Fri, Jun 23, 2023 at 01:07:24AM +0200, Thomas Gleixner wrote:
> > On Thu, Jun 22 2023 at 16:27, Thomas Gleixner wrote:
> > > On Fri, Jun 16 2023 at 15:18, Feng Tang wrote:
> > > So something like the below should just work.
> >
> > Well it works in principle, but does not take any of the command line
> > parameters which limit nr_possible CPUs or the actual kernel
> > configuration into account. But the principle itself works correctly.
> >
> > Below is an updated version, which takes them into account.
> >
> > The data here is from a two socket system with 32 CPUs per socket.
> >
> > No command line parameters (NR_CPUS=64):
> >
> > smpboot: Allowing 64 CPUs, 32 hotplug CPUs
> > clocksource: tsc-early: mask: 0xffffffffffffffff max_cycles: 0x1e3306b9ada, max_idle_ns: 440795224413 ns
> > smp: Brought up 1 node, 32 CPUs
> > smpboot: Max logical packages ACPI enumeration: 2
> >
> > "possible_cpus=32" (NR_CPUS=64) or
> > No command line parameter (NR_CPUS=32):
> >
> > smpboot: Allowing 32 CPUs, 0 hotplug CPUs
> > clocksource: tsc-early: mask: 0xffffffffffffffff max_cycles: 0x1e3306b9ada, max_idle_ns: 440795224413 ns
> > smp: Brought up 1 node, 32 CPUs
> > smpboot: Max logical packages ACPI enumeration: 1
> >
> > maxcpus=32
> > smpboot: Allowing 64 CPUs, 0 hotplug CPUs
> > clocksource: tsc-early: mask: 0xffffffffffffffff max_cycles: 0x1e3306b9ada, max_idle_ns: 440795224413 ns
> > smp: Brought up 1 node, 32 CPUs
> > smpboot: Max logical packages ACPI enumeration: 2
> >
> > But that's really all we should do. If the ACPI table enumerates CPUs as
> > hotpluggable which can never arrive, then so be it.
> >
> > We have enough parameters to override the BIOS nonsense. Trying to do
> > more magic MAD table parsing with heuristics is just wrong.
> >
> > We already have way too many heuristics and workarounds for broken
> > firmware, but for the problem at hand, we really don't need more.
> >
> > The only systems I observed so far which have a non-sensical amount of
> > "hotpluggable" CPUs are high-end server machines. It's a resonable
> > expectation that machines with high-end price tags come with correct
> > firmware. Trying to work around that (except with the existing command
> > line options) is just proliferating this mess. This has to stop.
> >
> > Thanks,
> >
> > tglx
>
> Thanks for helping on this.
>
> I run some tests with your patch againt latest kernel, and found with
> some "maxcpus=" setup, the kernel will soft hung, that it will print
> some hung/stall message from time to time.
>
> My test machine is Cascacade Lake AP, 2 packages (4 NUMA nodes), 96C
> and 192T. The cmdline is "maxcpus=24", and 24 is the number of core
> per NUMA node. Don't know if you can reproduce it with "maxcpus=16"
> on your test box.
>
> The box is in remote lab and I don't have serial console, but a remote
> console, and I took 2 pictures of the error message (attched).
>
> Also I will check more on how to debug on this remote machine.

[ Above mail was auto-rejected by many mail servers due to the big size
of the pictures ]

>From debug, the reason of the hung/stall is detect_extended_topology_early()
is called by cpu hotplug after boot, where there is "maxcpus=XXX" setting,
(#echo 1 > /sys/devices/system/cpu/cpuX/online).

It could be fixed with below patch:
----------------------------------------------------------------
diff --git a/arch/x86/kernel/cpu/topology.c b/arch/x86/kernel/cpu/topology.c
index 828c1f7edac1..1ff73c8c4972 100644
--- a/arch/x86/kernel/cpu/topology.c
+++ b/arch/x86/kernel/cpu/topology.c
@@ -29,7 +29,7 @@ unsigned int __max_die_per_package __read_mostly = 1;
EXPORT_SYMBOL(__max_die_per_package);

#ifdef CONFIG_SMP
-unsigned int apic_to_pkg_shift __ro_after_init;
+unsigned int apic_to_pkg_shift;

/*
* Check if given CPUID extended topology "leaf" is implemented

----------------------------------------------------------------

I also tested 'numa=off' and 'numa=fake=8' cmdline parameter on one
2 package Cascad Lake SP and one 2 package (4 NUMA nodes) Cascade
Lake AP, and the code works fine by giving the _correct_ estimation:

"smpboot: Max logical packages ACPI enumeration: 2"

Thanks,
Feng

> Thanks,
> Feng