RE: [PATCH v3 4/5] selftests/resctrl: Cleanup properly when an error occurs in CAT test

From: Shaopeng Tan (Fujitsu)
Date: Tue Nov 08 2022 - 03:33:30 EST


Hi Reinette and Shuah,

> On 11/2/2022 2:41 AM, Shuah Khan wrote:
> > On 11/1/22 03:43, Shaopeng Tan wrote:
> >> After creating a child process with fork() in CAT test, if there is
> >> an error occurs or such as a SIGINT signal is received, the parent
> >> process will be terminated immediately, but the child process will
> >> not be killed and also umount_resctrlfs() will not be called.
> >>
> >> Add a signal handler like other tests to kill child process, umount
> >> resctrlfs, cleanup result files, etc. when an error occurs.
> >>
> >> Signed-off-by: Shaopeng Tan <tan.shaopeng@xxxxxxxxxxxxxx>
> >> ---
> >>   tools/testing/selftests/resctrl/cat_test.c | 28
> >> +++++++++++++++-------
> >>   1 file changed, 19 insertions(+), 9 deletions(-)
> >>
> >> diff --git a/tools/testing/selftests/resctrl/cat_test.c
> >> b/tools/testing/selftests/resctrl/cat_test.c
> >> index 6a8306b0a109..5f81817f4366 100644
> >> --- a/tools/testing/selftests/resctrl/cat_test.c
> >> +++ b/tools/testing/selftests/resctrl/cat_test.c
> >> @@ -98,12 +98,21 @@ void cat_test_cleanup(void)
> >>       remove(RESULT_FILE_NAME2);
> >>   }
> >>   +static void ctrl_handler(int signo)
> >> +{
> >> +    kill(bm_pid, SIGKILL);
> >> +    umount_resctrlfs();
> >> +    tests_cleanup();
> >> +    ksft_print_msg("Ending\n\n");
> >
> > Is there a reason to print this message? Remove it unless it serves a
> > purpose.
>
> This function appears to be a duplicate of existing resctrl_val.c:ctrlc_handler().
> Could the duplication be avoided instead of refining the copy?

Yes, it's a duplicate of existing resctrl_val.c:ctrlc_handler().
I will try to use resctrl_val.c:ctrlc_handler() in next version patch series.

> >> +
> >> +    exit(EXIT_SUCCESS);
> >> +}
> >> +
> >>   int cat_perf_miss_val(int cpu_no, int n, char *cache_type)
> >>   {
> >>       unsigned long l_mask, l_mask_1;
> >>       int ret, pipefd[2], sibling_cpu_no;
> >>       char pipe_message;
> >> -    pid_t bm_pid;
> >
> > Odd. bm_pid is used below - why remove it here?
>
> There is a global bm_pid in resctrl_val.c that is made available via extern in
> resctrl.h. This is what causes this code to still compile but I would also like to
> learn why moving to that is desired as a change here. I expect such a big
> change to get a mention in the commit message.

Yes. I used global bm_pid, since bm_pid is used in ctrl_handler() before this function cat_perf_miss_val().
I will add a description to commit message.

> >>         cache_size = 0;
> >>   @@ -181,17 +190,19 @@ int cat_perf_miss_val(int cpu_no, int n, char
> >> *cache_type)
> >>           strcpy(param.filename, RESULT_FILE_NAME1);
> >>           param.num_of_runs = 0;
> >>           param.cpu_no = sibling_cpu_no;
> >> +    } else {
> >> +        /* set up ctrl-c handler */
> >> +        if (signal(SIGINT, ctrl_handler) == SIG_ERR ||
> >> +            signal(SIGHUP, ctrl_handler) == SIG_ERR ||
> >> +            signal(SIGTERM, ctrl_handler) == SIG_ERR)
> >> +            printf("Failed to catch SIGNAL!\n");
> >
> > Is perror() more appropriate here?
>
> Should we be using signal() at all? "man signal" reads:
> "WARNING: the behavior of signal() varies across UNIX versions, and has also
> varied historically across different versions of Linux.
> Avoid its use: use sigaction(2) instead."
>
> "Failed to catch SIGNAL" also seems unclear to me. This is where a signal
> handler is set up, the signal for which the handler is installed has not arrived.

I will use sigaction(2) and perror().

> >>       }
> >>         remove(param.filename);
> >>         ret = cat_val(&param);
> >> -    if (ret)
> >> -        return ret;
> >> -
> >> -    ret = check_results(&param);
> >> -    if (ret)
> >> -        return ret;
> >> +    if (ret == 0)
> >> +        ret = check_results(&param);
> >
> > Why not use a goto in error case to do umount_resctrlfs() instead of
> > changing the conditionals?
>
> My understanding is the code that follows is needed to synchronize the exits
> between the parent and child. It is the parent that will run umount_resctrlfs()
> and it should only do so after the child is done. A goto by the parent may thus
> cause
> umount_resctrlfs() to be run while the child still relies on it while a goto by the
> child may cause the parent not to receive the message that the child is
> complete.

Yes, the code that follows is needed to synchronize the exits between the parent and child.

> > @@ -201,7 +212,6 @@ int cat_perf_miss_val(int cpu_no, int n, char
> *cache_type)
> > sizeof(pipe_message)) {
> > close(pipefd[1]);
> > perror("# failed signaling parent process");
> > - return errno;
>
> It looks like pipefd[1] will be closed twice if the write() failed.

This close(pipefd[1]); should also be removed.

> It does look strange to let the child continue to its infinite loop after the write()
> failed. I assume that it is because the parent will also be stuck and the new
> ctrl_handler() is expected to unblock both?

If a SIGNAL is received, ctrl_handler() will be called to kill the child process and exit parent process.
If no SIGNAL is received, the parent process will kill the child process. (by else{kill(bm_pid, SIGKILL);})

> Could you please add a comment to the code to clarify this flow?
I will add comments here.

Best regards,
Shaopeng Tan