Re: [PATCH 1/2] x86/CPU/AMD: Present package as die instead of socket

From: Borislav Petkov
Date: Tue Jun 27 2017 - 06:48:38 EST


Remove stable from CC - this is not how you do a stable submission.

See Documentation/process/stable-kernel-rules.rst

On Tue, Jun 27, 2017 at 01:40:52AM -0500, Suravee Suthikulpanit wrote:
> According to the Documentation/x86/topology.txt, AMD nomenclature for
> package is NUMA node (or die).

You guys keep talking about die so you should add the definition of
"die" to that document so that we're all on the same page. In a separate
patch.

Which reminds me, this patch does a gazillion things. It needs to be
split.

> However, this is not the case on AMD family17h multi-die processor
> platforms, which can have up to 4 dies per socket as shown in the
> following system topology.

So what exactly does that mean?

A die is a package on ZN and you can have up to 4 packages on a physical
socket?

Please be specific and verbose.

> Die (Dx) View :
> ----------------------------
> C0 | T0 T1 | || | T0 T1 | C4
> --------| || |--------
> C1 | T0 T1 | L3 || L3 | T0 T1 | C5
> --------| || |--------
> C2 | T0 T1 | #0 || #1 | T0 T1 | C6
> --------| || |--------
> C3 | T0 T1 | || | T0 T1 | C7
> ----------------------------
>
> System View (with 2 socket) :
> --------------------
> | -------------|------
> | | | |
> ------------ ------------
> | D1 -- D0 | | D7 -- D6 |
> | | \/ | | | | \/ | |
> SOCKET0 | | /\ | | | | /\ | | SOCKET1
> | D2 -- D3 | | D4 -- D5 |
> ------------ ------------
> | | | |
> ------|------------| |
> --------------------
>
> Current logic interpretes package as socket (i.e. phys_proc_id is
> socket id), which results in setting x86_has_numa_in_package, and omits
> the DIE schedule domain.

Well, this is what we do on MCM Magny-Cours. Btw, here's where you
explain *why* you need the DIE domain.

> However, NUMA schedule domains are derived from
> SRAT/SLIT, which assumes NUMA node is a die,

Ok, so I don't understand this: SRAT/SLIT should basically say that you
have 4 *logical* NUMA nodes on that *physical* socket. Is that what you
want?

Or is not what you want?

Because it makes sense to me that BIOS should describe how the logical
grouping of the cores in a node is. And x86_numa_in_package_topology
roughly means that we rely on SRAT/SLIT to tell us the topology.

So why can't SRAT/SLIT do that just like they did on Magy-Cours MCM
packages?

> and build NUMA schedule
> domains on top of NUMA nodes. This results in incomplete schedule domains
> as following:
> domain 0: SMT
> domain 1: MC /* core complex w/ shared L3*/
> ---- Missing DIE level domain ----
> domain 2: NUMA /* socket */
> domain 3: NUMA /* platform */
>
> Presenting package-as-die does not set x86_has_numa_in_package.
>
> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@xxxxxxx>
> Signed-off-by: Leo Duran <leo.duran@xxxxxxx>
> Signed-off-by: Yazen Ghannam <yazen.ghannam@xxxxxxx>

This SOB chain is wrong, of course. Lemme guess, you want to say that
you all three worked on this patch, right?

Well, this is not how we express that. Please read this:

Documentation/process/submitting-patches.rst, section 11

And while you're at it, read section 3 too pls.

> Cc: <stable@xxxxxxxxxxxxxxx> # v4.10+
> ---
> arch/x86/kernel/cpu/amd.c | 189 +++++++++++++++++++++++++++-------------------
> 1 file changed, 112 insertions(+), 77 deletions(-)
>
> diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
> index bb5abe8..2f5869c 100644
> --- a/arch/x86/kernel/cpu/amd.c
> +++ b/arch/x86/kernel/cpu/amd.c
> @@ -1,3 +1,5 @@
> +#define pr_fmt(fmt) "x86/AMD: " fmt
> +

This needs to go in a separate patch along with auditing all pr_* calls
in that file and removing any other prefixes they have.

> #include <linux/export.h>
> #include <linux/bitops.h>
> #include <linux/elf.h>
> @@ -32,6 +34,12 @@ static bool cpu_has_amd_erratum(struct cpuinfo_x86 *cpu, const int *erratum);
> */
> static u32 nodes_per_socket = 1;
>
> +/*
> + * l3_num_threads_sharing: Stores the number of threads sharing L3 cache.
> + * Refer to CPUID_Fn8000001D_EAX_x03 [Cache Properties (L3)] NumSharingCache.
> + */
> +static u32 l3_num_threads_sharing;

u32 is clearly too much. I know, nodes_per_socket is u32 too but that
should be turned into an u8 too. In a separate patch.

> +
> static inline int rdmsrl_amd_safe(unsigned msr, unsigned long long *p)
> {
> u32 gprs[8] = { 0 };
> @@ -296,96 +304,122 @@ static int nearby_node(int apicid)
> }
> #endif
>
> +#ifdef CONFIG_SMP
> +

This is an unnecessary move.

> /*
> - * Fixup core topology information for
> - * (1) AMD multi-node processors
> + * Per Documentation/x86/topology.c,

$ ls Documentation/x86/topology.c
ls: cannot access 'Documentation/x86/topology.c': No such file or directory

> + * the kernel works with
> + * {packages, cores, threads}, and we will map:
> + *
> + * thread = core in compute-unit (CMT), or thread in core (SMT)
> + * core = compute-unit (CMT), or core (SMT)
> + * package = node (die)

Put all those definitions into Documentation/x86/topology.txt and point
only to that file in the comment here.

> + *
> + * Discover topology based on available information from CPUID first,
> + * and only derive them as needed.
> + *
> + * (1) phys_proc_id is die ID in AMD multi-die processors.
> * Assumption: Number of cores in each internal node is the same.

So that's wrong:

- cpuinfo_x86.phys_proc_id:

The physical ID of the package. This information is retrieved via CPUID
and deduced from the APIC IDs of the cores in the package.

> - * (2) AMD processors supporting compute units
> + * (2) cpu_core_id is derived from either CPUID topology extension
> + * or initial APIC_ID.
> + * (3) cpu_llc_id is either L3 or per-node

Move to Documentation/x86/topology.txt

> */
> -#ifdef CONFIG_SMP
> static void amd_get_topology(struct cpuinfo_x86 *c)
> {
> - u8 node_id;
> int cpu = smp_processor_id();
>
> - /* get information required for multi-node processors */
> if (boot_cpu_has(X86_FEATURE_TOPOEXT)) {
> u32 eax, ebx, ecx, edx;
>
> cpuid(0x8000001e, &eax, &ebx, &ecx, &edx);
>
> - node_id = ecx & 0xff;
> + c->phys_proc_id = ecx & 0xff;
> smp_num_siblings = ((ebx >> 8) & 0xff) + 1;
>
> - if (c->x86 == 0x15)
> - c->cu_id = ebx & 0xff;

You can't remove this, of course, see

79a8b9aa388b ("x86/CPU/AMD: Bring back Compute Unit ID")

And so on and so on.

So I'm going to stop reviewing here because:

* this patch needs to be split
* I still don't understand why we need all that dance and not describe the
topology properly with SRAT/SLIT

Thanks.

--
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.