Re: [PATCHv1 7/8] cgroup: cgroup namespace setns support

From: Andy Lutomirski
Date: Sun Oct 19 2014 - 14:27:13 EST

Next message: Alexandre Belloni: "[PATCHv8 0/2] Add Allwinner SoCs PWM support"
Previous message: Sebastian Schmidt: "[PATCH] pstore: honor dmesg_restrict sysctl on dmesg dumps"
In reply to: Eric W. Biederman: "Re: [PATCHv1 7/8] cgroup: cgroup namespace setns support"
Next in thread: Eric W.Biederman: "Re: [PATCHv1 7/8] cgroup: cgroup namespace setns support"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Sat, Oct 18, 2014 at 10:23 PM, Eric W. Biederman
<ebiederm@xxxxxxxxxxxx> wrote:
> "Serge E. Hallyn" <serge@xxxxxxxxxx> writes:
>
>> Quoting Aditya Kali (adityakali@xxxxxxxxxx):
>>> On Thu, Oct 16, 2014 at 2:12 PM, Serge E. Hallyn <serge@xxxxxxxxxx> wrote:
>>> > Quoting Aditya Kali (adityakali@xxxxxxxxxx):
>>> >> setns on a cgroup namespace is allowed only if
>>> >> * task has CAP_SYS_ADMIN in its current user-namespace and
>>> >> over the user-namespace associated with target cgroupns.
>>> >> * task's current cgroup is descendent of the target cgroupns-root
>>> >> cgroup.
>>> >
>>> > What is the point of this?
>>> >
>>> > If I'm a user logged into
>>> > /lxc/c1/user.slice/user-1000.slice/session-c12.scope and I start
>>> > a container which is in
>>> > /lxc/c1/user.slice/user-1000.slice/session-c12.scope/x1
>>> > then I will want to be able to enter the container's cgroup.
>>> > The container's cgroup root is under my own (satisfying the
>>> > below condition0 but my cgroup is not a descendent of the
>>> > container's cgroup.
>>> >
>>> This condition is there because we don't want to do implicit cgroup
>>> changes when a process attaches to another cgroupns. cgroupns tries to
>>> preserve the invariant that at any point, your current cgroup is
>>> always under the cgroupns-root of your cgroup namespace. But in your
>>> example, if we allow a process in "session-c12.scope" container to
>>> attach to cgroupns root'ed at "session-c12.scope/x1" container
>>> (without implicitly moving its cgroup), then this invariant won't
>>> hold.
>>
>> Oh, I see. Guess that should be workable. Thanks.
>
> Which has me looking at what the rules are for moving through
> the cgroup hierarchy.
>
> As long as we have write access to cgroup.procs and are allowed
> to open the file for write, we can move any of our own tasks
> into the cgroup. So the cgroup namespace rules don't seem
> to be a problem.
>
> Andy can you please take a look at the permission checks in
> __cgroup_procs_write.

The actual requirements for calling that function haven't changed,
right? IOW, what does this have to do with cgroupns? Is the idea
that you want a privileged user wrt a cgroupns's userns to be able to
use this? If so:

Yes, that current_cred() thing is bogus. (Actually, this is probably
exploitable right now if any cgroup.procs inode anywhere on the system
lets non-root write.) (Can we have some kernel debugging option that
makes any use of current_cred() in write(2) warn?)

We really need a weaker version of may_ptrace for this kind of stuff.
Maybe the existing may_ptrace stuff is okay, actually. But this is
completely missing group checks, cap checks, capabilities wrt the
userns, etc.

Also, I think that, if this version of the patchset allows non-init
userns to unshare cgroupns, then the issue of what permission is
needed to lock the cgroup hierarchy like that needs to be addressed,
because unshare(CLONE_NEWUSER|CLONE_NEWCGROUP) will effectively pin
the calling task with no permission required. Bolting on a fix later
will be a mess.

--Andy

>
> As I read the code I see 3 security gaffaws in the permssion check.
> - Using current->cred instead of file->f_cred.
> - Not checking tcred->euid.
> - Checking GLOBAL_ROOT_UID instead of having a capable call.
>
> The file permission on cgroup.procs seem just sufficient to keep
> to keep those bugs from being easily exploitable.
>
> Eric

--
Andy Lutomirski
AMA Capital Management, LLC
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Alexandre Belloni: "[PATCHv8 0/2] Add Allwinner SoCs PWM support"
Previous message: Sebastian Schmidt: "[PATCH] pstore: honor dmesg_restrict sysctl on dmesg dumps"
In reply to: Eric W. Biederman: "Re: [PATCHv1 7/8] cgroup: cgroup namespace setns support"
Next in thread: Eric W.Biederman: "Re: [PATCHv1 7/8] cgroup: cgroup namespace setns support"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]