[PATCH v15-RFC 0/8] Add support for Sub-NUMA cluster (SNC) systems

From: Tony Luck
Date: Tue Jan 30 2024 - 17:20:52 EST


This is the re-worked version of this series that I promised to post
yesterday. Check that e-mail for the arguments for this alternate
approach.

https://lore.kernel.org/all/ZbhLRDvZrxBZDv2j@agluck-desk3/

Apologies to Drew Fustini who I'd somehow dropped from later versions
of this series. Drew: you had made a comment at one point that having
different scopes within a single resource may be useful on RISC-V.
Version 14 included that, but it's gone here. Maybe multiple resctrl
"struct resource" for a single h/w entity like L3 as I'm doing in this
version could work for you too?

Patches 1-5 are almost completely rewritten based around the new
idea to give CMT and MBM their own "resource" instead of sharing
one with L3 CAT. This removes the need for separate domain lists,
and thus most of the churn of the previous version of this series.

Patches 6-8 are largely unchanged. But I removed all the Reviewed
and Tested tags since they are now built on a completely different
base.

Patches are against tip x86/cache:

fc747eebef73 ("x86/resctrl: Remove redundant variable in mbm_config_write_domain()")

Re-work doesn't affect the v14 cover letter, so pasting it here:

The Sub-NUMA cluster feature on some Intel processors partitions the CPUs
that share an L3 cache into two or more sets. This plays havoc with the
Resource Director Technology (RDT) monitoring features. Prior to this
patch Intel has advised that SNC and RDT are incompatible.

Some of these CPU support an MSR that can partition the RMID counters in
the same way. This allows monitoring features to be used. With the caveat
that users must be aware that Linux may migrate tasks more frequently
between SNC nodes than between "regular" NUMA nodes, so reading counters
from all SNC nodes may be needed to get a complete picture of activity
for tasks.

Cache and memory bandwidth allocation features continue to operate at
the scope of the L3 cache.

Signed-off-by: Tony Luck <tony.luck@xxxxxxxxx>

Tony Luck (8):
x86/resctrl: Split the RDT_RESOURCE_L3 resource
x86/resctrl: Move all monitoring functions to RDT_RESOURCE_L3_MON
x86/resctrl: Prepare for non-cache-scoped resources
x86/resctrl: Add helper function to look up domain_id from scope
x86/resctrl: Add "NODE" as an option for resource scope
x86/resctrl: Introduce snc_nodes_per_l3_cache
x86/resctrl: Sub NUMA Cluster detection and enable
x86/resctrl: Update documentation with Sub-NUMA cluster changes

Documentation/arch/x86/resctrl.rst | 25 ++-
include/linux/resctrl.h | 10 +-
arch/x86/include/asm/msr-index.h | 1 +
arch/x86/kernel/cpu/resctrl/internal.h | 3 +
arch/x86/kernel/cpu/resctrl/core.c | 181 +++++++++++++++++++++-
arch/x86/kernel/cpu/resctrl/monitor.c | 28 ++--
arch/x86/kernel/cpu/resctrl/pseudo_lock.c | 6 +-
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 12 +-
8 files changed, 236 insertions(+), 30 deletions(-)


base-commit: fc747eebef734563cf68a512f57937c8f231834a
--
2.43.0