Re: [RFC 20/20] ima: Setup securityfs_ns for IMA namespace

From: Stefan Berger
Date: Wed Dec 01 2021 - 16:35:14 EST



On 12/1/21 16:11, James Bottomley wrote:
On Wed, 2021-12-01 at 15:25 -0500, Stefan Berger wrote:
On 12/1/21 14:21, James Bottomley wrote:
On Wed, 2021-12-01 at 13:11 -0500, Stefan Berger wrote:
On 12/1/21 12:56, James Bottomley wrote:
[...]
I tried this with runc and a user namespace active mapping uid
1000 on the host to uid 0 in the container. There I run into the
problem that all of the files and directories without the above
work-around are mapped to 'nobody', just like all the files in
sysfs in this case are also mapped to nobody. This code resolved
the issue.
So I applied your patches with the permission shift commented out
and instrumented inode_alloc() to see where it might be failing and
I actually find it all works as expected for me:

ejb@testdeb:~> unshare -r --user --mount --ima
root@testdeb:~# mount -t securityfs_ns none /sys/kernel/security
root@testdeb:~# ls -l /sys/kernel/security/ima/
total 0
-r--r----- 1 root root 0 Dec 1 19:11 ascii_runtime_measurements
-r--r----- 1 root root 0 Dec 1 19:11 binary_runtime_measurements
-rw------- 1 root root 0 Dec 1 19:11 policy
-r--r----- 1 root root 0 Dec 1 19:11 runtime_measurements_count
-r--r----- 1 root root 0 Dec 1 19:11 violations

I think your problem is something to do with how runc is installing
the uid/gid mappings. If it's installing them after the
security_ns inodes are created then they get the -1 value (because
no mappings exist in s_user_ns). I can even demonstrate this by
forcing unshare to enter the IMA namespace before writing the
mapping values and I'll see "nobody nogroup" above like you do.
I am surprised you get this mapping even after commenting the
permission adjustments... it doesn't work for me when I comment them
out:

[stefanb@ima-ns-dev rootfs]$ unshare -r --user --mount
[root@ima-ns-dev rootfs]# mount -t securityfs_ns none
/sys/kernel/security/
[root@ima-ns-dev rootfs]# cd /sys/kernel/security/ima/
[root@ima-ns-dev ima]# ls -l
total 0
-r--r-----. 1 nobody nobody 0 Dec 1 15:20 ascii_runtime_measurements
-r--r-----. 1 nobody nobody 0 Dec 1 15:20
binary_runtime_measurements
-rw-------. 1 nobody nobody 0 Dec 1 15:20 policy
-r--r-----. 1 nobody nobody 0 Dec 1 15:20 runtime_measurements_count
-r--r-----. 1 nobody nobody 0 Dec 1 15:20 violations
[root@ima-ns-dev ima]# cat /proc/self/uid_map
0 1000 1
[root@ima-ns-dev ima]# cat /proc/self/gid_map
0 1000 1

The initialization of securityfs and setup of files and directories
happens at the same time as the IMA namespace is created. At this
time there are no user mappings available, so that's why I need to
make the adjustments 'late'.
There is one other possible difference: To get the correct s_user_ns

I am currently wondering why I cannot re-create your setup while disabling the remapping...




on the securityfs_ns mount, the mount namespace itself has to be owned
by the user namespace ... is runc doing that correctly? I always

Following an strace of 'runc create' I see an unshare(CLONE_NEWUSER) by a process before it does an unshare(CLONE_NEWNS|CLONE_NEWUTS|CLONE_NEWIPC|CLONE_NEWPID|CLONE_NEWNET), so this seems to be doing it in the order you suggest.

Also, runc seems to have its own set of struggles. I am not sure we would be able to ask them to accommodate us to do it 'correctly' - it doesn't sound so 'easy' for them either to get everything under the hood:

https://github.com/opencontainers/runc/blob/master/libcontainer/nsenter/nsexec.c#L919

     * In order for this unsharing code to be more extensible we need to split
     * up unshare(CLONE_NEWUSER) and clone() in various ways. The ideal case
     * would be if we did clone(CLONE_NEWUSER) and the other namespaces
     * separately, but because of SELinux issues we cannot really do that. But

[...]

     * However, if we unshare(2) the user namespace *before* we clone(2), then
     * all hell breaks loose.

sounds like fun

So, I am not quite sure whether I am working around an issue of runc but for that I would like to first be able to re-create your successful setup to see what's different.

   Stefan


forget this detail because unshare does it correctly automatically but
it means you must unshare the user namespace first and then unshare the
mount namespace (or do it in the same sys call because the kernel will
get the correct order).

James