Re: [syzbot] WARNING: locking bug in umh_complete

From: Peter Zijlstra
Date: Fri Feb 03 2023 - 07:30:36 EST


On Fri, Feb 03, 2023 at 01:19:53PM +0100, Peter Zijlstra wrote:
> On Fri, Feb 03, 2023 at 07:48:35PM +0900, Tetsuo Handa wrote:
> > On 2023/02/03 19:22, Tetsuo Handa wrote:
> > > Yes, this bug is caused by commit f5d39b020809 ("freezer,sched: Rewrite core freezer
> > > logic"), for that commit for unknown reason omits wait_for_completion(&done) call
> > > when wait_for_completion_state(&done, state) returned -ERESTARTSYS.
> > >
> > > Peter, is it safe to restore wait_for_completion(&done) call?
> > >
> >
> > Something like this?
> >
> > diff --git a/kernel/umh.c b/kernel/umh.c
> > index 850631518665..97230edb1849 100644
> > --- a/kernel/umh.c
> > +++ b/kernel/umh.c
> > @@ -441,8 +441,8 @@ int call_usermodehelper_exec(struct subprocess_info *sub_info, int wait)
> > if (wait & UMH_KILLABLE)
> > state |= TASK_KILLABLE;
> >
> > - if (wait & UMH_FREEZABLE)
> > - state |= TASK_FREEZABLE;
> > + //if (wait & UMH_FREEZABLE)
> > + // state |= TASK_FREEZABLE;
> >
> > retval = wait_for_completion_state(&done, state);
> > if (!retval)
> > @@ -452,7 +452,9 @@ int call_usermodehelper_exec(struct subprocess_info *sub_info, int wait)
> > /* umh_complete() will see NULL and free sub_info */
> > if (xchg(&sub_info->complete, NULL))
> > goto unlock;
> > + /* fallthrough, umh_complete() was already called */
> > }
> > + wait_for_completion(&done);
> >
> > wait_done:
> > retval = sub_info->retval;
> >
> > How does TASK_FREEZABLE affect here?
>
> It marks those waits that are safe to convert to a frozen state.
>
> > Since call_usermodehelper_exec() is a function for starting and
> > waiting for termination of a userspace process (which is subjected to
> > freezing), the caller of call_usermodehelper_exec() can't wait for the
> > termination of that userspace process if that process was frozen, and
> > wait_for_completion()
> > blocks forever?
>
> It'll probably make the freeze fail and abort the suspend. We first
> freezer userspace (including the helper), then we try and freeze all the
> kernel threads. If we can't, we error out and abort -- waking everything
> back up.
>
> But now I realize what I missed before, wait_for_completion() it not
> interruptible.
>
> I think the right fix is to:
>
> state &= ~TASK_KILLABLE;

state &= ~__TASK_WAKEKILL;

we don't want to mask out UNINTERUPTIBLE, that would be bad.

> wait_for_completion_state(&done, state);
>
> Also, put in a comment..