Re: [PATCH 0/3] Enable namespaced file capabilities

From: Serge E. Hallyn
Date: Thu Jun 22 2017 - 21:20:48 EST


Quoting James Bottomley (James.Bottomley@xxxxxxxxxxxxxxxxxxxxx):
> On Thu, 2017-06-22 at 18:36 -0500, Serge E. Hallyn wrote:
> > Yes, the use case is: to allow root in the container to set the
> > privilege itself, without endangering any resources not owned by
> > that root.
>
> OK, so you envisage the same filesystem being mounted in different user
> namespaces

Well no - in lxd we have a separate filesystem for each container.
The filesystems are not shared.

> and being able to see their own value for the xattr. It
> still seems a bit weird that they'd be able to change file contents and
> have that seen by the other userns but not xattrs.

Not sure what you mean. If they have privilege over the inode, they
can write a xattr targeted at their own root userid.

> > If you're going to have a root owned host-wide
> > orchestration system setting up the rootfs, then you don't
> > necessary need this at all.
>
> I wasn't thinking it would be root owned, just that it would have a
> predefined range of allowed uids and be able to map multiple containers
> to subsets of these.

Hm. In that case they should not be allowed to write your proposed
'security.capability@uid' capability, because that would also grant
capabilities over subuids which they were not delegated.

(but see below)

> > As you say a @uid to say "any unprivileged userns" might be useful.
> > The implication is that root on the host doesn't trust the image
> > enough to write a real global file capability, but trusts it enough
> > to 'endanger' all containers on the host. If that's the case, I have
> > no objection to adding this as a feature.
>
> Yes, precisely. The filesystem is certified as permitted to override
> the xattr whatever unprivileged mapping for root is in place.
>
> How would we effect the switch? I suppose some global flag because I
> can't see we'd be mixing use cases in a physical system.

I might be confused. But thought CAP_SETFCAP against init_user_ns would
be required to set 'security.capability@uid'. That, or you could create
a user namespace mapping [ 1 - 4294967295 ] to [ 0 = 4294967294 ], and
have CAP_SETFCAP against that namespace. Which would allow you to run
without host root privilege.

-serge