Re: [syzbot] WARNING: locking bug in umh_complete

From: Peter Zijlstra
Date: Fri Feb 03 2023 - 07:20:28 EST


On Fri, Feb 03, 2023 at 07:48:35PM +0900, Tetsuo Handa wrote:
> On 2023/02/03 19:22, Tetsuo Handa wrote:
> > Yes, this bug is caused by commit f5d39b020809 ("freezer,sched: Rewrite core freezer
> > logic"), for that commit for unknown reason omits wait_for_completion(&done) call
> > when wait_for_completion_state(&done, state) returned -ERESTARTSYS.
> >
> > Peter, is it safe to restore wait_for_completion(&done) call?
> >
>
> Something like this?
>
> diff --git a/kernel/umh.c b/kernel/umh.c
> index 850631518665..97230edb1849 100644
> --- a/kernel/umh.c
> +++ b/kernel/umh.c
> @@ -441,8 +441,8 @@ int call_usermodehelper_exec(struct subprocess_info *sub_info, int wait)
> if (wait & UMH_KILLABLE)
> state |= TASK_KILLABLE;
>
> - if (wait & UMH_FREEZABLE)
> - state |= TASK_FREEZABLE;
> + //if (wait & UMH_FREEZABLE)
> + // state |= TASK_FREEZABLE;
>
> retval = wait_for_completion_state(&done, state);
> if (!retval)
> @@ -452,7 +452,9 @@ int call_usermodehelper_exec(struct subprocess_info *sub_info, int wait)
> /* umh_complete() will see NULL and free sub_info */
> if (xchg(&sub_info->complete, NULL))
> goto unlock;
> + /* fallthrough, umh_complete() was already called */
> }
> + wait_for_completion(&done);
>
> wait_done:
> retval = sub_info->retval;
>
> How does TASK_FREEZABLE affect here?

It marks those waits that are safe to convert to a frozen state.

> Since call_usermodehelper_exec() is a function for starting and
> waiting for termination of a userspace process (which is subjected to
> freezing), the caller of call_usermodehelper_exec() can't wait for the
> termination of that userspace process if that process was frozen, and
> wait_for_completion()
> blocks forever?

It'll probably make the freeze fail and abort the suspend. We first
freezer userspace (including the helper), then we try and freeze all the
kernel threads. If we can't, we error out and abort -- waking everything
back up.

But now I realize what I missed before, wait_for_completion() it not
interruptible.

I think the right fix is to:

state &= ~TASK_KILLABLE;
wait_for_completion_state(&done, state);

Also, put in a comment..