Re: proc_flush_task oops

From: Eric W. Biederman
Date: Thu Dec 21 2017 - 11:41:58 EST


Dave Jones <davej@xxxxxxxxxxxxxxxxx> writes:

> On Thu, Dec 21, 2017 at 12:38:12PM +0200, Alexey Dobriyan wrote:
> > On 12/21/17, Eric W. Biederman <ebiederm@xxxxxxxxxxxx> wrote:
> > > I have stared at this code, and written some test programs and I can't
> > > see what is going on. alloc_pid by design and in implementation (as far
> > > as I can see) is always single threaded when allocating the first pid
> > > in a pid namespace. idr_init always initialized idr_next to 0.
> > >
> > > So how we can get past:
> > >
> > > if (unlikely(is_child_reaper(pid))) {
> > > if (pid_ns_prepare_proc(ns)) {
> > > disable_pid_allocation(ns);
> > > goto out_free;
> > > }
> > > }
> > >
> > > with proc_mnt still set to NULL is a mystery to me.
> > >
> > > Is there any chance the idr code doesn't always return the lowest valid
> > > free number? So init gets assigned something other than 1?
> >
> > Well, this theory is easy to test (attached).
>
> I'll give this a shot and report back when I get to the office.
>
> > There is a "valid" way to break the code via kernel.ns_last_pid:
> > unshare+write+fork but the reproducer doesn't seem to use it (or it does?)
>
> that sysctl is root only, so that isn't at play here.

ns_capable(CAP_SYS_ADMIN) will allow root in a user namespace. So the
sysctl should be fuzzable.

The ns_last_pid sysctl is still not in play because it changes
task_active_pid_ns (aka the pid namespace of the callers pid) not
pid_ns_for_children. So it still is not in play.

Every time I think of a "valid" way to break the code, I double check
myself and find there are already checks in place to prevent that.

Eric