Re: [PATCH v4 6/7] x86/resctrl: Update documentation with Sub-NUMA cluster changes

From: Reinette Chatre
Date: Fri Aug 11 2023 - 13:33:31 EST


Hi Tony,

On 7/22/2023 12:07 PM, Tony Luck wrote:
> With Sub-NUMA Cluster mode enabled the scope of monitoring resources is
> per-NODE instead of per-L3 cache. Suffixes of directories with "L3" in
> their name refer to Sub-NUMA nodes instead of L3 cache ids.
>
> Signed-off-by: Tony Luck <tony.luck@xxxxxxxxx>
> Reviewed-by: Peter Newman <peternewman@xxxxxxxxxx>
> ---
> Documentation/arch/x86/resctrl.rst | 10 +++++++---
> 1 file changed, 7 insertions(+), 3 deletions(-)
>
> diff --git a/Documentation/arch/x86/resctrl.rst b/Documentation/arch/x86/resctrl.rst
> index cb05d90111b4..4d9ddb91751d 100644
> --- a/Documentation/arch/x86/resctrl.rst
> +++ b/Documentation/arch/x86/resctrl.rst
> @@ -345,9 +345,13 @@ When control is enabled all CTRL_MON groups will also contain:
> When monitoring is enabled all MON groups will also contain:
>
> "mon_data":
> - This contains a set of files organized by L3 domain and by
> - RDT event. E.g. on a system with two L3 domains there will
> - be subdirectories "mon_L3_00" and "mon_L3_01". Each of these
> + This contains a set of files organized by L3 domain or by NUMA
> + node (depending on whether Sub-NUMA Cluster (SNC) mode is disabled
> + or enabled respectively) and by RDT event. E.g. on a system with
> + SNC mode disabled with two L3 domains there will be subdirectories
> + "mon_L3_00" and "mon_L3_01". The numerical suffix refers to the
> + L3 cache id. With SNC enabled the directory names are the same,
> + but the numerical suffix refers to the node id. Each of these
> directories have one file per event (e.g. "llc_occupancy",
> "mbm_total_bytes", and "mbm_local_bytes"). In a MON group these
> files provide a read out of the current value of the event for

I think it would be helpful to add a modified version of the snippet
(from previous patch changelog) regarding well-behaved NUMA apps.
With the above it may be confusing that a single cache allocation has
multiple cache occupancy counters.

This also changes the meaning of the numbers in the directory names.
The documentation already provides guidance on how to find the cache
ID of a logical CPU (see section "Cache IDs"). I think it will be
helpful to add a snippet that makes it clear to users how to map
a CPU to its node ID.

Reinette