Re: [PATCH 10/24] selftests/resctrl: Remove nested calls in perf event handling

From: Ilpo Järvinen
Date: Fri Nov 03 2023 - 05:45:44 EST


On Thu, 2 Nov 2023, Reinette Chatre wrote:

> Hi Ilpo,
>
> On 10/24/2023 2:26 AM, Ilpo Järvinen wrote:
> > Perf event handling has functions that are the sole caller of another
> > perf event handling related function:
> > - reset_enable_llc_perf() calls perf_event_open_llc_miss()
> > - reset_enable_llc_perf() calls ioctl_perf_event_ioc_reset_enable()
> > - measure_llc_perf() calls get_llc_perf()
> >
> > Remove the extra layer of calls to make the code easier to follow by
> > moving the code into the calling function.
> >
> > In addition, converts print_results_cache() unsigned long parameter to
> > __u64 that matches the type coming from perf.
>
> Is this referring to work from previous patch?

Yes, this got split into own patch and it unfortunately lingered on in the
changelog. I'll remove it.

> > Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@xxxxxxxxxxxxxxx> ---
> >
> > 1 file changed, 25 insertions(+), 61 deletions(-)
> >
> > diff --git a/tools/testing/selftests/resctrl/cache.c b/tools/testing/selftests/resctrl/cache.c
> > index d39ef4eebc37..208af1ecae28 100644
> > --- a/tools/testing/selftests/resctrl/cache.c
> > +++ b/tools/testing/selftests/resctrl/cache.c

> > -/*
> > - * get_llc_perf: llc cache miss through perf events
> > - * @llc_perf_miss: LLC miss counter that is filled on success
> > - *
> > - * Perf events like HW_CACHE_MISSES could be used to validate number of
> > - * cache lines allocated.
> > - *
> > - * Return: =0 on success. <0 on failure.
> > - */
> > -static int get_llc_perf(__u64 *llc_perf_miss)
> > -{
> > - int ret;
> > -
> > - /* Stop counters after one span to get miss rate */
> > -
> > - ioctl(fd_lm, PERF_EVENT_IOC_DISABLE, 0);
> > -
> > - ret = read(fd_lm, &rf_cqm, sizeof(struct read_format));
> > - if (ret == -1) {
> > - perror("Could not get llc misses through perf");
> > + fd_lm = perf_event_open(&pea_llc_miss, pid, cpu_no, -1, PERF_FLAG_FD_CLOEXEC);
> > + if (fd_lm == -1) {
> > + perror("Error opening leader");
> > + ctrlc_handler(0, NULL, NULL);
>
> I understand you just copied the code here ... but it is not clear to me
> why this particular error handling deserves a ctrlc_handler().

Good catch! It's first time I even notice this line.

It certainly looks a bit too big hammer for error handling. I'll try to
create a separate patch which properly removes it (I suspect all error
handling rollbacks are there already so it can just be removed but I'll
have to confirm that).

> > return -1;
> > }
> >
> > - *llc_perf_miss = rf_cqm.values[0].value;
> > + /* Start counters to log values */
> > + ioctl(fd_lm, PERF_EVENT_IOC_RESET, 0);
> > + ioctl(fd_lm, PERF_EVENT_IOC_ENABLE, 0);
> >
> > return 0;
> > }
> > @@ -166,20 +121,29 @@ static int print_results_cache(char *filename, int bm_pid, __u64 llc_value)
> > return 0;
> > }
> >
> > +/*
> > + * measure_llc_perf: measure perf events
> > + * @bm_pid: child pid that runs benchmark
>
> I expected "bm_pid" to reflect a "benchmark pid" that
> is not unique to the child. Are both parent and child
> not running the benchmark?
>
> Missing doc of a parameter here.
>
> > + *
> > + * Measure things like cache misses from perf events.
>
> "things like cache misses" is vague. The function's name
> still contains "llc" which makes me think it is not quite
> generic yet.

I suppose I tried to be intentionally vague because I knew I was going to
use it measure also LLC accesses in the case of L2 CAT test. I'll see if I
can improve this wording somehow.

> > + *
> > + * Return: =0 on success. <0 on failure.
> > + */
> > static int measure_llc_perf(struct resctrl_val_param *param, int bm_pid)
> > {
> > - __u64 llc_perf_miss = 0;
> > int ret;
> >
> > - /*
> > - * Measure cache miss from perf.
> > - */
> > - ret = get_llc_perf(&llc_perf_miss);
> > - if (ret < 0)
> > - return ret;
> > + /* Stop counters after one span to get miss rate */
> > + ioctl(fd_lm, PERF_EVENT_IOC_DISABLE, 0);
> >
> > - ret = print_results_cache(param->filename, bm_pid, llc_perf_miss);
> > - return ret;
> > + ret = read(fd_lm, &rf_cqm, sizeof(struct read_format));
> > + close(fd_lm);
>
> I am not able to tell where this close() moved from.

Another good catch.

It seems to be a conflict resolution fallout from the time when I changed
to close() logic in that fix patch related to this fd.

--
i.