[PATCH v6 0/8] Add support for Sub-NUMA cluster (SNC) systems

From: Tony Luck
Date: Thu Sep 28 2023 - 15:14:06 EST


The Sub-NUMA cluster feature on some Intel processors partitions
the CPUs that share an L3 cache into two or more sets. This plays
havoc with the Resource Director Technology (RDT) monitoring features.
Prior to this patch Intel has advised that SNC and RDT are incompatible.

Some of these CPU support an MSR that can partition the RMID
counters in the same way. This allows for monitoring features
to be used (with the caveat that memory accesses between different
SNC NUMA nodes may still not be counted accuratlely.

Note that this patch series improves resctrl reporting considerably
on systems with SNC enabled, but there will still be some anomalies
for processes accessing memory from other sub-NUMA nodes.

Signed-off-by: Tony Luck <tony.luck@xxxxxxxxx>

---

Summary of changes since v5 - see each patch commit for more specifics

Rebased to v6.6-rc3

0001 Define "scope" enum with values 2, 3 for caches to simplify some
code (but sanity check before each such usage).
Better warning messages when scope lookup fails

0002 New patch so that some code can be shared between looking up
control and monitor domains

0003 Spell "mondomains" as "mon_domains" and be consistent with all
the other "mon" identifiers also having similar "_".
Don't leave control stuff with old names, change those too
so now have ctrl_scope, ctrl_domains, etc.

0004 Use infrastructure from 0002 to have a common rdt_find_domain()
function for both types of domain structure.
0003 was using same "rdt_domain" structure for both control
and monitor domains. Divide it into rdt_ctrl_domain and
rdt_mon_domain structures with just the fields they need.
Ditto for rdt_hw_domain. Also split and rename many support
functions and macros.
Lots of "fir tree local declaration order" changes because
lengths of typenames changed.

0005 Better commit description

0006 Better commit and code comments

0007 More explanations in commit and code comments.
Use consistent naming for "snc_*()" functions.

Patch to update selftests dropped from this series. Someone else
has taken over that work.

Tony Luck (8):
x86/resctrl: Prepare for new domain scope
x86/resctrl: Prepare to split rdt_domain structure
x86/resctrl: Prepare for different scope for control/monitor
operations
x86/resctrl: Split the rdt_domain and rdt_hw_domain structures
x86/resctrl: Add node-scope to the options for feature scope
x86/resctrl: Introduce snc_nodes_per_l3_cache
x86/resctrl: Sub NUMA Cluster detection and enable
x86/resctrl: Update documentation with Sub-NUMA cluster changes

Documentation/arch/x86/resctrl.rst | 34 +-
include/linux/resctrl.h | 78 +++--
arch/x86/include/asm/msr-index.h | 1 +
arch/x86/kernel/cpu/resctrl/internal.h | 66 ++--
arch/x86/kernel/cpu/resctrl/core.c | 380 +++++++++++++++++-----
arch/x86/kernel/cpu/resctrl/ctrlmondata.c | 52 +--
arch/x86/kernel/cpu/resctrl/monitor.c | 58 ++--
arch/x86/kernel/cpu/resctrl/pseudo_lock.c | 14 +-
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 131 ++++----
9 files changed, 567 insertions(+), 247 deletions(-)


base-commit: 6465e260f48790807eef06b583b38ca9789b6072
--
2.41.0