Re: [PATCH v4 04/19] selftests/resctrl: Close perf value read fd on errors

From: Reinette Chatre
Date: Fri Jul 14 2023 - 13:36:26 EST


Hi Ilpo,

On 7/14/2023 3:35 AM, Ilpo Järvinen wrote:
> On Thu, 13 Jul 2023, Reinette Chatre wrote:
>> On 7/13/2023 6:19 AM, Ilpo Järvinen wrote:
>>> Perf event fd (fd_lm) is not closed on some error paths.
>>>
>>> Always close fd_lm in get_llc_perf() and add close into an error
>>> handling block in cat_val().
>>>
>>> Fixes: 790bf585b0ee ("selftests/resctrl: Add Cache Allocation Technology (CAT) selftest")
>>> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@xxxxxxxxxxxxxxx>
>>> ---
>>> tools/testing/selftests/resctrl/cache.c | 10 +++++-----
>>> 1 file changed, 5 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/tools/testing/selftests/resctrl/cache.c b/tools/testing/selftests/resctrl/cache.c
>>> index 8a4fe8693be6..ced47b445d1e 100644
>>> --- a/tools/testing/selftests/resctrl/cache.c
>>> +++ b/tools/testing/selftests/resctrl/cache.c
>>> @@ -87,21 +87,20 @@ static int reset_enable_llc_perf(pid_t pid, int cpu_no)
>>> static int get_llc_perf(unsigned long *llc_perf_miss)
>>> {
>>> __u64 total_misses;
>>> + int ret;
>>>
>>> /* Stop counters after one span to get miss rate */
>>>
>>> ioctl(fd_lm, PERF_EVENT_IOC_DISABLE, 0);
>>>
>>> - if (read(fd_lm, &rf_cqm, sizeof(struct read_format)) == -1) {
>>> + ret = read(fd_lm, &rf_cqm, sizeof(struct read_format));
>>> + close(fd_lm);
>>> + if (ret == -1) {
>>> perror("Could not get llc misses through perf");
>>> -
>>> return -1;
>>> }
>>>
>>> total_misses = rf_cqm.values[0].value;
>>> -
>>> - close(fd_lm);
>>> -
>>> *llc_perf_miss = total_misses;
>>>
>>> return 0;
>>> @@ -253,6 +252,7 @@ int cat_val(struct resctrl_val_param *param)
>>> memflush, operation, resctrl_val)) {
>>> fprintf(stderr, "Error-running fill buffer\n");
>>> ret = -1;
>>> + close(fd_lm);
>>> break;
>>> }
>>>
>>
>> Instead of fixing these existing patterns I think it would make the code
>> easier to understand and maintain if it is made symmetrical.
>> Having the perf event fd opened in one place but its close()
>> scattered elsewhere has the potential for confusion and making later
>> mistakes easy to miss.
>>
>> What if perf event fd is closed in a new "disable_llc_perf()" that
>> is matched with "reset_enable_llc_perf()" and called
>> from cat_val()?
>>
>> I think this raises another issue with the test trickery where
>> measure_cache_vals() has some assumptions about state based on the
>> test name.
>
> I very much agree on the principle here, and thus I already have created
> patches which will do a major cleanup on this area. The cleaned-up code
> has pe_fd local var to cat_val() and handles closing it in cat_val() with
> the usual patterns.
>
> However, the patch is currently resides post L3 CAT test rewrite.
> Backporting the cleanups/refactors into this series would require
> considerable effort due to how convoluted all those n-step cleanup patches
> and L3 CAT test rewrite are in this area. There's just very much to
> cleanup here and L3 rewrite will touch the same areas so its a net
> full of conflicts.
>
> Do you want me to spend the effort to backport them into this series
> (I expect will take some time)?

Considering the "Fixes" tag, having a smaller fix that can easily
be backported would be ideal so I am ok with deferring a bigger
rework.

I do think this fix can be made more robust with a couple of small
changes that should not introduce significant conflicts:
* initialize fd_lm to -1
* do not close() fd_lm in get_llc_perf() but instead move its
close() to at exit of cat_val().
* add check in get_llc_perf() that it does not attempt ioctl()
on "fd_lm == -1" (later addition would be error checking of
the ioctl())

> I currently have these items pending besides this series (in order):
> - L3 CAT test rewrite and its preparatory patches
> - More cleanups (including the pe_fd cleanup)
> - New generalized test framework
> - L2 CAT test

Thank you very much for taking this on.

Reinette