Re: [PATCH v10 7/8] x86/resctrl: Sub NUMA Cluster detection and enable

From: Reinette Chatre
Date: Mon Nov 06 2023 - 19:33:19 EST


Hi Tony,

On 10/31/2023 2:17 PM, Tony Luck wrote:
> There isn't a simple hardware bit that indicates whether a CPU is
> running in Sub NUMA Cluster (SNC) mode. Infer the state by comparing
> the ratio of NUMA nodes to L3 cache instances.
>
> When SNC mode is detected, reconfigure the RMID counters by updating
> the MSR_RMID_SNC_CONFIG MSR on each socket as CPUs are seen.
>
> Clearing bit zero of the MSR divides the RMIDs and renumbers the ones
> on the second SNC node to start from zero.
>
> Signed-off-by: Tony Luck <tony.luck@xxxxxxxxx>
> ---
> Changes since v9
> Expand h/w to hardware (commit and code comments)
> Remove "earlier commit" reference
> s/counnter/counter/
> Check for offline CPUs and warn user SNC detection may be broken.
>
> arch/x86/include/asm/msr-index.h | 1 +
> arch/x86/kernel/cpu/resctrl/core.c | 100 ++++++++++++++++++++++++++++-
> 2 files changed, 99 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
> index e3fa9cecd599..4285a5ee81fe 100644
> --- a/arch/x86/include/asm/msr-index.h
> +++ b/arch/x86/include/asm/msr-index.h
> @@ -1109,6 +1109,7 @@
> #define MSR_IA32_QM_CTR 0xc8e
> #define MSR_IA32_PQR_ASSOC 0xc8f
> #define MSR_IA32_L3_CBM_BASE 0xc90
> +#define MSR_RMID_SNC_CONFIG 0xca0
> #define MSR_IA32_L2_CBM_BASE 0xd10
> #define MSR_IA32_MBA_THRTL_BASE 0xd50
>
> diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
> index 97d2a5a7dd41..034f9797e1fb 100644
> --- a/arch/x86/kernel/cpu/resctrl/core.c
> +++ b/arch/x86/kernel/cpu/resctrl/core.c
> @@ -16,11 +16,14 @@
>
> #define pr_fmt(fmt) "resctrl: " fmt
>
> +#include <linux/cpu.h>
> #include <linux/slab.h>
> #include <linux/err.h>
> #include <linux/cacheinfo.h>
> #include <linux/cpuhotplug.h>
> +#include <linux/mod_devicetable.h>
>
> +#include <asm/cpu_device_id.h>
> #include <asm/intel-family.h>
> #include <asm/resctrl.h>
> #include "internal.h"
> @@ -184,10 +187,10 @@ bool is_mba_sc(struct rdt_resource *r)
>
> /*
> * rdt_get_mb_table() - get a mapping of bandwidth(b/w) percentage values
> - * exposed to user interface and the h/w understandable delay values.
> + * exposed to user interface and the hardware understandable delay values.
> *
> * The non-linear delay values have the granularity of power of two
> - * and also the h/w does not guarantee a curve for configured delay
> + * and also the hardware does not guarantee a curve for configured delay
> * values vs. actual b/w enforced.
> * Hence we need a mapping that is pre calibrated so the user can
> * express the memory b/w as a percentage value.

This seems out of place here. If you want to make such a global change
it can be done as a separate patch. For this work it can just consistently
use "hardware" in own areas changed.

Reinette