Re: [PATCH v4] Introduce v3 namespaced file capabilities

From: Serge E. Hallyn
Date: Tue Jun 13 2017 - 19:50:33 EST


Quoting Serge E. Hallyn (serge@xxxxxxxxxx):
> Quoting Stefan Berger (stefanb@xxxxxxxxxxxxxxxxxx):
> > On 05/08/2017 02:11 PM, Serge E. Hallyn wrote:
> > >Root in a non-initial user ns cannot be trusted to write a traditional
> > >security.capability xattr. If it were allowed to do so, then any
> > >unprivileged user on the host could map his own uid to root in a private
> > >namespace, write the xattr, and execute the file with privilege on the
> > >host.
> > >
> > >However supporting file capabilities in a user namespace is very
> > >desirable. Not doing so means that any programs designed to run with
> > >limited privilege must continue to support other methods of gaining and
> > >dropping privilege. For instance a program installer must detect
> > >whether file capabilities can be assigned, and assign them if so but set
> > >setuid-root otherwise. The program in turn must know how to drop
> > >partial capabilities, and do so only if setuid-root.
> >
> > Hi Serge,
> >
> >
> > I have been looking at patch below primarily to learn how we could
> > apply a similar technique to security.ima and security.evm for a
> > namespaced IMA. From the paragraphs above I thought that you solved
> > the problem of a shared filesystem where one now can write different
> > security.capability xattrs by effectively supporting for example
> > security.capability[uid=1000] and security.capability[uid=2000]
>
> Interesting idea. Worth considering.
>
> > written into the filesystem. Each would then become visible as
> > security.capability if the userns mapping is set appropriately.
> > However, this doesn't seem to be how it is implemented. There seems
>
> Indeed, when I was considering supporting multiple simulatenous
> xattrs, I did it as something like:
>
> struct vfs_ns_cap_data {
> struct {
> __le32 permitted;
> __le32 inheritable;
> } data[VFS_CAP_U32];
> __le32 rootid;
> };
>
> struct vfs_ns_cap {
> __le32 magic_etc;
> __le32 n_entries;
> struct ns_cap_data data[0];
> }; // followed by n_entries of struct ns_cap_data
>
> You're instead suggesting encoding the rootuid in the name,
> which is interesting.
>
> > to be only a single such entry with uid appended to it and, if it
> > was a shared filesystem, the first one to set this attribute blocks
> > everyone else from writing the xattr. Is that how it works? Would
>
> Approximately - indeed there is only a single xattr. But it can be
> overwritten, so long as the writer has CAP_SETFCAP over the user_ns
> which mounted the filesystem.

Hang on. I've mis-spoken. That's the requirement for writing a
v2 xattr. To write a v3 xattr you only need to be privileged
(with CAP_SETFCAP) against the inode.