Re: [PATCH] umh: fix UAF when the process is being killed

From: Schspa Shi
Date: Thu Dec 22 2022 - 07:08:18 EST



Schspa Shi <schspa@xxxxxxxxx> writes:

> Luis Chamberlain <mcgrof@xxxxxxxxxx> writes:
>
>> On Thu, Dec 22, 2022 at 01:45:46PM +0800, Schspa Shi wrote:
>>>
>>> Schspa Shi <schspa@xxxxxxxxx> writes:
>>>
>>> > Luis Chamberlain <mcgrof@xxxxxxxxxx> writes:
>>> >
>>> >> Peter, Ingo, Steven would like you're review.
>>> >>
>>> >> On Tue, Dec 13, 2022 at 03:03:53PM -0800, Luis Chamberlain wrote:
>>> >>> On Mon, Dec 12, 2022 at 09:38:31PM +0800, Schspa Shi wrote:
>>> >>> > I'd like to upload a V2 patch with the new solution if you prefer the
>>> >>> > following way.
>>> >>> >
>>> >>> > diff --git a/kernel/umh.c b/kernel/umh.c
>>> >>> > index 850631518665..8023f11fcfc0 100644
>>> >>> > --- a/kernel/umh.c
>>> >>> > +++ b/kernel/umh.c
>>> >>> > @@ -452,6 +452,11 @@ int call_usermodehelper_exec(struct subprocess_info *sub_info, int wait)
>>> >>> > /* umh_complete() will see NULL and free sub_info */
>>> >>> > if (xchg(&sub_info->complete, NULL))
>>> >>> > goto unlock;
>>> >>> > + /*
>>> >>> > + * kthreadd (or new kernel thread) will call complete()
>>> >>> > + * shortly.
>>> >>> > + */
>>> >>> > + wait_for_completion(&done);
>>> >>> > }
>>> >>>
>>> >>> Yes much better. Did you verify it fixes the splat found by the bots?
>>> >>
>>> >> Wait, I'm not sure yet why this would fix it... I first started thinking
>>> >> that this may be a good example of a Coccinelle SmPL rule, something like:
>>> >>
>>> >> DECLARE_COMPLETION_ONSTACK(done);
>>> >> foo *foo;
>>> >> ...
>>> >> foo->completion = &done;
>>> >> ...
>>> >> queue_work(system_unbound_wq, &foo->work);
>>> >> ....
>>> >> ret = wait_for_completion_state(&done, state);
>>> >> ...
>>> >> if (!ret)
>>> >> S
>>> >> ...
>>> >> +wait_for_completion(&done);
>>> >>
>>> >> But that is pretty complex, and while it may be useful to know how many
>>> >> patterns we have like this, it begs the question if generalizing this
>>> >> inside the callers is best for -ERESTARTSYS condition is best. What
>>> >> do folks think?
>>> >>
>>> >> The rationale here is that if you queue stuff and give access to the
>>> >> completion variable but its on-stack obviously you can end up with the
>>> >> queued stuff complete() on a on-stack variable. The issue seems to
>>> >> be that wait_for_completion_state() for -ERESTARTSYS still means
>>> >> that the already scheduled queue'd work is *about* to run and
>>> >> the process with the completion on-stack completed. So we race with
>>> >> the end of the routine and the completion on-stack.
>>> >>
>>> >> It makes me wonder if wait_for_completion() above really is doing
>>> >> something more, if it is just helping with timing and is still error
>>> >> prone.
>>> >>
>>> >> The queued work will try the the completion as follows:
>>> >>
>>> >> static void umh_complete(struct subprocess_info *sub_info)
>>> >> {
>>> >> struct completion *comp = xchg(&sub_info->complete, NULL);
>>> >> /*
>>> >> * See call_usermodehelper_exec(). If xchg() returns NULL
>>> >> * we own sub_info, the UMH_KILLABLE caller has gone away
>>> >> * or the caller used UMH_NO_WAIT.
>>> >> */
>>> >> if (comp)
>>> >> complete(comp);
>>> >> else
>>> >> call_usermodehelper_freeinfo(sub_info);
>>> >> }
>>> >>
>>> >> So the race is getting -ERESTARTSYS on the process with completion
>>> >> on-stack and the above running complete(comp). Why would sprinkling
>>> >> wait_for_completion(&done) *after* wait_for_completion_state(&done, state)
>>> >> fix this UAF?
>>> >
>>> > The wait_for_completion(&done) is added when xchg(&sub_info->complete,
>>> > NULL) return NULL. When it returns NULL, it means the umh_complete was
>>> > using the completion variable at the same time and will call complete
>>> > in a very short time.
>>> >
>>> Hi Luis:
>>>
>>> Is there any further progress on this problem? Does the above
>>> explanation answer your doubts?
>>
>> I think it would be useful to proove your work for you to either
>> hunt with SmPL coccinelle a similar flaw / how rampant this issue
>> is and then also try to create the same UAF there and prove how
>> your changes fixes it.
>>
>
> OK, but it will take some time.

Hi Luis:

I have made a kernel module to prove this fix. Please check this at:
Link: https://github.com/schspa/code_snippet/tree/master/kernel_module/completion

There is both bad & ok test case in the README.org.

>
>> Luis


--
BRs
Schspa Shi