Re: [RFC][PATCH] x86, sched: allow topolgies where NUMA nodes share an LLC

From: Peter Zijlstra
Date: Tue Nov 07 2017 - 03:30:38 EST


On Mon, Nov 06, 2017 at 02:15:00PM -0800, Dave Hansen wrote:

> But, the CPUID for the SNC configuration discussed above enumerates
> the LLC as being shared by the entire package. This is not 100%
> precise because the entire cache is not usable by all accesses. But,
> it *is* the way the hardware enumerates itself, and this is not likely
> to change.

So CPUID and SRAT will remain inconsistent; even in future products?
That would absolutely blow chunks.

If that is the case, we'd best use a fake feature like
X86_BUG_TOPOLOGY_BROKEN and use that instead of an ever growing list of
models in this code.

> +/*
> + * Set if a package/die has multiple NUMA nodes inside.
> + * AMD Magny-Cours, Intel Cluster-on-Die, and Intel
> + * Sub-NUMA Clustering have this.
> + */
> +static bool x86_has_numa_in_package;
> +
> static bool match_llc(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
> {
> int cpu1 = c->cpu_index, cpu2 = o->cpu_index;
>
> + /* Do not match if we do not have a valid APICID for cpu: */
> + if (per_cpu(cpu_llc_id, cpu1) == BAD_APICID)
> + return false;
> +
> + /* Do not match if LLC id does not match: */
> + if (per_cpu(cpu_llc_id, cpu1) != per_cpu(cpu_llc_id, cpu2))
> + return false;
>
> + /*
> + * Some Intel CPUs enumerate an LLC that is shared by
> + * multiple NUMA nodes. The LLC on these systems is
> + * shared for off-package data acccess but private to the
> + * NUMA node (half of the package) for on-package access.
> + *
> + * CPUID can only enumerate the cache as being shared *or*
> + * unshared, but not this particular configuration. The
> + * CPU in this case enumerates the cache to be shared
> + * across the entire package (spanning both NUMA nodes).
> + */
> + if (!topology_same_node(c, o) &&
> + (c->x86_model == INTEL_FAM6_SKYLAKE_X)) {

This needs a c->x86_vendor test; imagine the fun when AMD releases a
part with model == SKX ...

> + /* Use NUMA instead of coregroups for scheduling: */
> + x86_has_numa_in_package = true;
> +
> + /*
> + * Now, tell the truth, that the LLC matches. But,
> + * note that throwing away coregroups for
> + * scheduling means this will have no actual effect.
> + */
> + return true;

What are the ramifications here? Is anybody else using that cpumask
outside of the scheduler topology setup?

> + }
> +
> + return topology_sane(c, o, "llc");
> }