Re: [PATCH 2/4] selftests/resctrl: SNC support for CMT

From: Ilpo Järvinen
Date: Fri Mar 08 2024 - 08:53:38 EST


On Wed, 6 Mar 2024, Maciej Wieczor-Retman wrote:

> Cache Monitoring Technology (CMT) works by measuring how much data in L3
> cache is occupied by a given process identified by its Resource
> Monitoring ID (RMID).
>
> On systems with Sub-Numa Clusters (SNC) enabled, a process can occupy
> not only the cache that belongs to its own NUMA node but also pieces of
> other NUMA nodes' caches that lie on the same socket.
>
> A simple correction to make the CMT selftest NUMA-aware is to sum values
> reported by all nodes on the same socket for a given RMID.
>
> Reported-by: "Shaopeng Tan (Fujitsu)" <tan.shaopeng@xxxxxxxxxxx>
> Closes: https://lore.kernel.org/all/TYAPR01MB6330B9B17686EF426D2C3F308B25A@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/
> Signed-off-by: Maciej Wieczor-Retman <maciej.wieczor-retman@xxxxxxxxx>
> ---
> tools/testing/selftests/resctrl/cache.c | 17 +++++++++++------
> tools/testing/selftests/resctrl/resctrl.h | 4 +++-
> tools/testing/selftests/resctrl/resctrl_val.c | 9 ++++++---
> 3 files changed, 20 insertions(+), 10 deletions(-)
>
> diff --git a/tools/testing/selftests/resctrl/cache.c b/tools/testing/selftests/resctrl/cache.c
> index 1b339d6bbff1..dab81920033b 100644
> --- a/tools/testing/selftests/resctrl/cache.c
> +++ b/tools/testing/selftests/resctrl/cache.c
> @@ -161,16 +161,21 @@ int perf_event_measure(int pe_fd, struct perf_event_read *pe_read,
> *
> * Return: =0 on success. <0 on failure.
> */
> -int measure_llc_resctrl(const char *filename, int bm_pid)
> +int measure_llc_resctrl(const char *filename, int bm_pid, const char *ctrlgrp,
> + const char *mongrp, int res_id)
> {
> - unsigned long llc_occu_resc = 0;
> + unsigned long sum = 0, llc_occu_resc = 0;
> int ret;
>
> - ret = get_llc_occu_resctrl(&llc_occu_resc);
> - if (ret < 0)
> - return ret;
> + for (int i = 0 ; i < snc_ways() ; i++) {

Spaces as per usual coding style:

for (int i = 0; i < snc_ways(); i++) {

> + set_cmt_path(ctrlgrp, mongrp, res_id + i);
> + ret = get_llc_occu_resctrl(&llc_occu_resc);
> + if (ret < 0)
> + return ret;
> + sum += llc_occu_resc;
> + }
>
> - return print_results_cache(filename, bm_pid, llc_occu_resc);
> + return print_results_cache(filename, bm_pid, sum);
> }
>
> /*

> @@ -828,6 +828,8 @@ int resctrl_val(const struct resctrl_test *test,
> sleep(1);
>
> /* Test runs until the callback setup() tells the test to stop. */
> + get_domain_id("L3", uparams->cpu, &res_id);

Hardcoding L3 here limits the genericness of this function. You don't even
need to do it, get_domain_id() does "MB" -> "L3" transformation implicitly
for you so you can just pass test->resource instead.

Also, I don't understand why you now again make the naming inconsistent
with "res_id".

If you based this on top of the patches I just posted, resctl_val()
already the domain_id variable.

--
i.

> + res_id *= snc_ways();
> while (1) {
> ret = param->setup(test, uparams, param);
> if (ret == END_OF_TESTS) {
> @@ -844,7 +846,8 @@ int resctrl_val(const struct resctrl_test *test,
> break;
> } else if (!strncmp(resctrl_val, CMT_STR, sizeof(CMT_STR))) {
> sleep(1);
> - ret = measure_llc_resctrl(param->filename, bm_pid);
> + ret = measure_llc_resctrl(param->filename, bm_pid, param->ctrlgrp,
> + param->mongrp, res_id);
> if (ret)
> break;
> }
>