Re: [RFC PATCH V2 0/1] x86: cpu topology fix and question on x86_max_cores

From: Zhang, Rui
Date: Mon Feb 20 2023 - 09:33:23 EST


On Mon, 2023-02-20 at 12:08 +0100, Peter Zijlstra wrote:
> On Mon, Feb 20, 2023 at 11:28:55AM +0800, Zhang Rui wrote:
>
> > Questions on how to fix cpuinfo_x86.x86_max_cores
> > -------------------------------------------------
> >
> > Fixing x86_max_cores is more complex. Current kernel uses below
> > logic to
> > get x86_max_cores
> > x86_max_cores = cpus_in_a_package / smp_num_siblings
> > But
> > 1. There is a known bug in CPUID.1F handling code. Thus
> > cpus_in_a_package
> > can be bogus. To fix it, I will add CPUID.1F Module level
> > support.
> > 2. x86_max_cores is set and used in an inconsistent way in current
> > kernel.
> > In short, smp_num_siblings/x86_max_cores
> > 2.1 represents the number of maximum *addressable* threads/cores
> > in a
> > core/package when retrieved via CPUID 1 and 4 on old
> > platforms.
> > CPUID.1 EBX 23:16 "Maximum number of addressable IDs for
> > logical
> > processors in this physical package".
> > CPUID.4 EAX 31:26 "Maximum number of addressable IDs for
> > processor
> > cores in the physical package".
> > 2.2 represents the number of maximum *possible* threads/cores in
> > a
> > core/package, when retrieved via CPUID.B/1F on non-Hybrid
> > platforms.
> > CPUID.B/1F EBX 15:0 "Number of logical processors at this
> > level type.
> > The number reflects configuration as shipped by Intel".
> > For example, in calc_llc_size_per_core()
> > do_div(llc_size, c->x86_max_cores);
> > x86_max_cores is used as the max *possible* cores in a
> > package.
> > 2.3 is used in a conflict way on other vendors like AMD by
> > checking the
> > code. I need help on confirming the proper behavior for AMD.
> > For example, in amd_get_topology(),
> > c->x86_coreid_bits = get_count_order(c->x86_max_cores);
> > x86_max_cores is used as the max *addressable* cores in a
> > package.
> > in get_nbc_for_node(),
> > cores_per_node = (c->x86_max_cores * smp_num_siblings) /
> > amd_get_nodes_per_socket();
> > x86_max_cores is used as the max *possible* cores in a
> > package.
> > 3. using
> > x86_max_cores = cpus_in_a_package / smp_num_siblings
> > to get the number of maximum *possible* cores in a package
> > during boot
> > cpu bringup is not applicable on platforms with asymmetric
> > cores.
> > Because, for a given number of threads, we don't know how many
> > of the
> > threads are the master thread or the only thread of a core, and
> > how
> > many of them are SMT siblings.
> > For example, on a platform with 6 Pcore and 8 Ecore, there are
> > 20
> > threads. But setting x86_max_cores to 10 is apparently wrong.
> >
> > Given the above situation, I have below question and any input is
> > really
> > appreciated.
> >
> > Is this inconsistency a problem or not?
>
> IIRC x86_max_cores in specific is only ever used in arch specific
> code,
> the pmu uncore drivers and things like that (grep shows MCE).

Do you mean that, as long as the usage of x86_max_cores matches its
definition for a given vendor/generation, the definition of
x86_max_cores can be inconsistent?

I was thinking how to make it consistent.
For Intel platform, defining x86_max_cores to max-*addressable*-cores-
in-a-package is not a problem in most cases, except the one below
calc_llc_size_per_core() in
arch/x86/kernel/cpu/microcode/intel.c
which needs the number of *possible* cores to get per core LLC size.
But I think we probably can improve this by replacing x86_max_cores
with boot_cpu_data.booted_cores? Because doing microcode update
requires all the cores to be online.

I don't know the answer for other X86 vendors.

>
> Also, perhaps you want to look at calculate_max_logical_packages().
> That
> has a comment about there not being heterogeneous systems :/

yeah, I noticed this previously.

ncpus = cpu_data(0).booted_cores * topology_max_smt_threads();
__max_logical_packages = DIV_ROUND_UP(total_cpus, ncpus);

The DIV_ROUND_UP() makes it to work on systems with current asymmetric
core systems. But
1. if a core can support more than 2 HT siblings, this can break if
there are multi symmetric packages.
2. if the system has asymmetric packages, this can break.
So far we don't have such platforms.
3. it can also be broken when using boot option 'maxcpus' as booted_cor
es changes.

But ironically, we don't have a better way to get __max_logical_package
s.


> Anyway, the reason I went and had a look there, is because I remember
> Thomas and me spend entirely too much time to try and figure out
> means
> to size an array for number of pacakges at boot time and getting it
> wrong too many times to recount.
>
> If only there was a sane way to tell these things without actually
> bringing everything online first :-(

I thought of improving this by parsing all the valid APIC-IDs in MADT
during BSP bootup, and get such information by decoding the APIC-IDs
using the APIC-ID layout information retrieved from BSP. But this is
likely to be a fertile new source of bugs as Dave concerned.

thanks,
rui