Re: [PATCH v4 0/7] Add support for Sub-NUMA cluster (SNC) systems

From: Drew Fustini
Date: Tue Jul 25 2023 - 23:11:18 EST


On Sat, Jul 22, 2023 at 12:07:33PM -0700, Tony Luck wrote:
> The Sub-NUMA cluster feature on some Intel processors partitions
> the CPUs that share an L3 cache into two or more sets. This plays
> havoc with the Resource Director Technology (RDT) monitoring features.
> Prior to this patch Intel has advised that SNC and RDT are incompatible.
>
> Some of these CPU support an MSR that can partition the RMID
> counters in the same way. This allows for monitoring features
> to be used (with the caveat that memory accesses between different
> SNC NUMA nodes may still not be counted accuratlely.
>
> Signed-off-by: Tony Luck <tony.luck@xxxxxxxxx>
>
> ---
>
> Changes since v3:
>
> Reinette provided the most excellent suggestion that this series
> could better achieve its objective if it enabled separate domain
> lists for control & monitoring within a resource, rather than
> creating a whole new resource to support separte node scope needed
> for SNC monitoring. Thus all the pre-amble patches from the previous
> version have gone, replaced by patches 1-4 of this new series.

[This comment is unrelated to Sub-NUMA support so please disregard if
this is the wrong place to make these comments]

I think that the resctrl interface for RISC-V CBQRI could also benefit
from separate domain lists for control and monitoring.

For example, the bandwidth controller QoS register [1] interface allows
a device to implement both bandwidth usage monitoring and bandwidth
allocation. The resctrl proof-of-concept [2] had to awkwardly create two
domains for each memory controller in our example SoC, one that would
contain the MBA resource and one that would contain the L3 resource to
represent MBM files like local_bytes.

This resulted in a very odd looking schemata that would be hard to the
user to understand:

# cat /sys/fs/resctrl/schemata
MB:4= 80;6= 80;8= 80
L2:0=0fff;1=0fff
L3:2=ffff;3=0000;5=0000;7=0000

Where:

Domain 0 is L2 cache controller 0 capacity allocation
Domain 1 is L2 cache controller 1 capacity allocation
Domain 2 is L3 cache controller capacity allocation

Domain 4 is Memory controller 0 bandwidth allocation
Domain 6 is Memory controller 1 bandwidth allocation
Domain 8 is Memory controller 2 bandwidth allocation

Domain 3 is Memory controller 0 bandwidth monitoring
Domain 5 is Memory controller 1 bandwidth monitoring
Domain 7 is Memory controller 2 bandwidth monitoring

But there is no value of having the domains created for the purposes of
bandwidth monitoring in schemata.

I've not yet fully understood how the new approach in this patch series
could help the situation for CBQRI, but I thought I would mention that
separate lists for control and monitoring might be useful.

Thanks,
Drew

[1] https://github.com/riscv-non-isa/riscv-cbqri/blob/main/qos_bandwidth.adoc
[2] https://lore.kernel.org/linux-riscv/20230419111111.477118-1-dfustini@xxxxxxxxxxxx/