[PATCH v4 0/7] Add support for Sub-NUMA cluster (SNC) systems

From: Tony Luck
Date: Sat Jul 22 2023 - 15:07:57 EST


The Sub-NUMA cluster feature on some Intel processors partitions
the CPUs that share an L3 cache into two or more sets. This plays
havoc with the Resource Director Technology (RDT) monitoring features.
Prior to this patch Intel has advised that SNC and RDT are incompatible.

Some of these CPU support an MSR that can partition the RMID
counters in the same way. This allows for monitoring features
to be used (with the caveat that memory accesses between different
SNC NUMA nodes may still not be counted accuratlely.

Signed-off-by: Tony Luck <tony.luck@xxxxxxxxx>

---

Changes since v3:

Reinette provided the most excellent suggestion that this series
could better achieve its objective if it enabled separate domain
lists for control & monitoring within a resource, rather than
creating a whole new resource to support separte node scope needed
for SNC monitoring. Thus all the pre-amble patches from the previous
version have gone, replaced by patches 1-4 of this new series.

Note to anyone backporting this to some older Linux kernel version.
You may be able to skip parts 2-4. These provide separate domain
structures for control and monitor with just the fields needed for
each. But this is largely cosmetic.

Of the code from v3 that survived to v4 the following changes have
been made (also from Reinette's review of v3).

1) Rename "snc_ways" to "snc_nodes_per_l3_cache" to avoid the confusing
use of "ways" which means something entirely different when talking
about caches.
2) Move the #define for MSR_RMID_SNC_CONFIG to <asm/msr-index.h> along
with all the other RDT MSRs.
3) Don't use a per-CPU variable "rmid_offset". Just calculate value
needed at the one place where it is used.
4) Don't create an entire resource structure with package scoped domains
just to set the SNC MSR.
5) Add comment in the commit message about adjusting the value shown in
the "size" files in each resctrl ctrl_mon directory.

This one not from Reinette:
6) Prevent mounting in "mba_MBps" mode when SNC mode is enabled. This
would just be confusing since monitoring is done at the node scope while
control is still at package scope.

Tony Luck (7):
x86/resctrl: Create separate domains for control and monitoring
x86/resctrl: Split the rdt_domain structures
x86/resctrl: Change monitor code to use rdt_mondomain
x86/resctrl: Delete unused fields from struct rdt_domain
x86/resctrl: Determine if Sub-NUMA Cluster is enabled and initialize.
x86/resctrl: Update documentation with Sub-NUMA cluster changes
selftests/resctrl: Adjust effective L3 cache size when SNC enabled

Documentation/arch/x86/resctrl.rst | 10 +-
include/linux/resctrl.h | 50 +++-
arch/x86/include/asm/msr-index.h | 1 +
arch/x86/kernel/cpu/resctrl/internal.h | 40 ++-
tools/testing/selftests/resctrl/resctrl.h | 1 +
arch/x86/kernel/cpu/resctrl/core.c | 289 ++++++++++++++++----
arch/x86/kernel/cpu/resctrl/ctrlmondata.c | 6 +-
arch/x86/kernel/cpu/resctrl/monitor.c | 58 ++--
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 54 ++--
tools/testing/selftests/resctrl/resctrlfs.c | 57 ++++
10 files changed, 427 insertions(+), 139 deletions(-)


base-commit: fdf0eaf11452d72945af31804e2a1048ee1b574c
--
2.40.1