Re: [syzbot] WARNING: locking bug in umh_complete

From: Luis Chamberlain
Date: Mon Feb 06 2023 - 10:51:38 EST


On Sat, Feb 04, 2023 at 09:48:39AM +0900, Tetsuo Handa wrote:
> On 2023/02/03 23:31, Peter Zijlstra wrote:
> > Yes, I meant something like so.
> >
> >
> > diff --git a/kernel/umh.c b/kernel/umh.c
> > index 850631518665..0e69e1e4b6fe 100644
> > --- a/kernel/umh.c
> > +++ b/kernel/umh.c
> > @@ -438,21 +438,24 @@ int call_usermodehelper_exec(struct subprocess_info *sub_info, int wait)
> > if (wait == UMH_NO_WAIT) /* task has freed sub_info */
> > goto unlock;
> >
> > - if (wait & UMH_KILLABLE)
> > - state |= TASK_KILLABLE;
> > -
> > - if (wait & UMH_FREEZABLE)
> > + if (wait & UMH_FREEZABLE) {
> > state |= TASK_FREEZABLE;
> >
> > - retval = wait_for_completion_state(&done, state);
> > - if (!retval)
> > - goto wait_done;
> > -
> > if (wait & UMH_KILLABLE) {
> > + retval = wait_for_completion_state(&done, state | TASK_KILLABLE);
> > + if (!retval)
> > + goto wait_done;
> > +
> > /* umh_complete() will see NULL and free sub_info */
> > if (xchg(&sub_info->complete, NULL))
> > goto unlock;
> > +
> > + /*
> > + * fallthrough; in case of -ERESTARTSYS now do uninterruptible
> > + * wait_for_completion().
> > + */
> > }
> > + wait_for_completion_state(&done, state);
> >
> > wait_done:
> > retval = sub_info->retval;
>
> Please fold below diff, provided that wait_for_completion_state(TASK_FREEZABLE)
> does not return when the current thread was frozen. (If
> wait_for_completion_state(TASK_FREEZABLE) returns when the current thread was
> frozen, we will fail to execute the
>
> retval = sub_info->retval;
>
> line in order to get the exit code after the current thread is thawed...)
>
> diff --git a/kernel/umh.c b/kernel/umh.c
> index 0e69e1e4b6fe..a776920ec051 100644
> --- a/kernel/umh.c
> +++ b/kernel/umh.c
> @@ -431,35 +431,38 @@ int call_usermodehelper_exec(struct subprocess_info *sub_info, int wait)
> * This makes it possible to use umh_complete to free
> * the data structure in case of UMH_NO_WAIT.
> */
> sub_info->complete = (wait == UMH_NO_WAIT) ? NULL : &done;
> sub_info->wait = wait;
>
> queue_work(system_unbound_wq, &sub_info->work);
> if (wait == UMH_NO_WAIT) /* task has freed sub_info */
> goto unlock;
>
> - if (wait & UMH_FREEZABLE) {
> + if (wait & UMH_FREEZABLE)
> state |= TASK_FREEZABLE;
>
> if (wait & UMH_KILLABLE) {
> retval = wait_for_completion_state(&done, state | TASK_KILLABLE);
> if (!retval)
> goto wait_done;
>
> /* umh_complete() will see NULL and free sub_info */
> if (xchg(&sub_info->complete, NULL))
> goto unlock;
>
> /*
> * fallthrough; in case of -ERESTARTSYS now do uninterruptible
> - * wait_for_completion().
> + * wait_for_completion_state(). Since umh_complete() shall call
> + * complete() in a moment if xchg() above returned NULL, this
> + * uninterruptible wait_for_completion_state() will not block
> + * SIGKILL'ed process for so long.
> */
> }
> wait_for_completion_state(&done, state);

I think this seems to be the same issue that Schspa Shi reported / provided a
fix sugggestion for [0]. This lead me to ask if:

a) incorrect usage of completion on stack could be generic and;
b) if we should instead have an API helper for that?

Although he already implemented a suggestion for b) to answer a) we need
some SmPL constructs yet to be written by Schspa. The reason I asked for
b) is that if this is a regular pattern it begs for a) as this sort of
issue could be prevalent in other places. So the status of Schspa's work
was that he was going to work on the SmPL grammar to check how frequent
this incorrect patern could be found.

Please let me know your thoughts as perhaps this issue / discussion
didn't get on Peter's radar as it was rececently discussed with Schspa
despite being on Cc.

[0] https://lore.kernel.org/all/m2pmcoag55.fsf@xxxxxxxxx/T/#u

Luis