Re: uid=0 inside user-namespace and procfs file permissions

From: Eric W. Biederman
Date: Tue Sep 30 2014 - 20:35:43 EST


Aditya Kali <adityakali@xxxxxxxxxx> writes:

> Hi all,
>
> I am trying to run a process with uid=0 inside userns. But in the when
> I also do capset() after setresuid(0, 0, 0), I am seeing inconsistent
> proc file permissions. Almost all the files in /proc/<pid>/ has global
> 'root' as owner and group even if the actual process uid is correctly
> changed.
>
> I wrote a simple program that demonstrate the issue:
>
> 1. parent, as global root (uid=0 in init_user_ns) fork()s a child
> 2. child:
> a) unshare(CLONE_NEWUSER)
> b) [wait for parent to write uid_map]
> c) setresgid(id, id, id) ; setresuid(0, 0, 0);
> d) conditionally call capset() to clear capabilities
> e) execve(/bin/sleep)
> 3. parent:
> a) populates child's uid_map and maps some uid to 0 inside userns. ex:
> 0 99 1
> b) waitpid()
>
> (the actual program can be found at http://pastebin.com/f4P17VFn for
> your reference).
>
> When there is no capset() call after setresuid(0,0,0), everything is
> fine. But when I do a capset() to clear all capabilities, the 'owner'
> and 'group' of all the files under /proc/<child_pid>/ of the child
> process are reverted to global 'root' user.
>
> # without capset (2.d):
> root@vm1# id
> uid=0(root) gid=0(root) groups=0(root)
>
> root@vm1# ./userns_uid0
> child_pid: 24277
> proc_file: /proc/24277/uid_map
> proc_file: /proc/24277/gid_map
> child resuming
>
> ^Z
> [1]+ Stopped ./userns_uid0
> root@vm1# cat /proc/24277/uid_map
> 0 99 1
> root@vm1# cat /proc/24277/status | grep -e "Uid:" -e "Gid:"
> Uid: 99 99 99 99
> Gid: 99 99 99 99
> root@vm1# ls -l /proc/24277/
> total 0
> dr-xr-xr-x 2 nobody nobody 0 2014-09-30 16:31 attr
> -r-------- 1 nobody nobody 0 2014-09-30 16:31 auxv
> -r--r--r-- 1 nobody nobody 0 2014-09-30 16:31 cgroup
> --w------- 1 nobody nobody 0 2014-09-30 16:31 clear_refs
> -r--r--r-- 1 nobody nobody 0 2014-09-30 16:31 cmdline
> -rw-r--r-- 1 nobody nobody 0 2014-09-30 16:31 comm
> -rw-r--r-- 1 nobody nobody 0 2014-09-30 16:31 coredump_filter
> -r--r--r-- 1 nobody nobody 0 2014-09-30 16:31 cpuset
> ...
> [All files have owner='nobody' and group='nobody' .. same as that of
> the process]
>
> With the additional capset() call, the files under /proc/<child_pid>/
> are now owned by global root:
>
> root@vm1# ./userns_uid0 resetcaps
> child_pid: 24706
> proc_file: /proc/24706/uid_map
> proc_file: /proc/24706/gid_map
> child resuming
> resetting caps
> ^Z
> [2]+ Stopped ./userns_uid0 resetcaps
> root@vm1# cat /proc/24706/uid_map
> 0 99 1
> root@vm1# cat /proc/24706/status | grep -e "Uid:" -e "Gid:"
> Uid: 99 99 99 99
> Gid: 99 99 99 99
>
> [Everything as before till now]
>
> root@vm1# ls -l /proc/24706/
> total 0
> dr-xr-xr-x 2 nobody nobody 0 2014-09-30 16:47 attr
> -r-------- 1 root root 0 2014-09-30 16:47 auxv
> -r--r--r-- 1 root root 0 2014-09-30 16:47 cgroup
> --w------- 1 root root 0 2014-09-30 16:47 clear_refs
> -r--r--r-- 1 root root 0 2014-09-30 16:47 cmdline
> -rw-r--r-- 1 root root 0 2014-09-30 16:47 comm
> -rw-r--r-- 1 root root 0 2014-09-30 16:47 coredump_filter
> -r--r--r-- 1 root root 0 2014-09-30 16:47 cpuset
> ...
> -r--r--r-- 1 root root 0 2014-09-30 16:47 mountinfo
> -r--r--r-- 1 root root 0 2014-09-30 16:47 mounts
> -r-------- 1 root root 0 2014-09-30 16:47 mountstats
> dr-xr-xr-x 5 nobody nobody 0 2014-09-30 16:47 net
> dr-x--x--x 2 root root 0 2014-09-30 16:47 ns
> -r--r--r-- 1 root root 0 2014-09-30 16:47 numa_maps
> ...
> -r--r--r-- 1 root root 0 2014-09-30 16:47 status
> -r-------- 1 root root 0 2014-09-30 16:47 syscall
> dr-xr-xr-x 3 nobody nobody 0 2014-09-30 16:47 task
> ..
>
> Only the directories 'attr', 'net' and 'task' are owned by the uid=99.
> Rest all files are owned by global root.
>
> This behavior seems inconsistent. I ran this on 3.17 kernel. Can
> someone with expertise in this area explain if this is expected?

So I am not quite certain what you are seeing.

In general proc files are expected to be owned by the euid of a process.
However when the task_dumpable is cleared the files become owned by the
global root user. We have considered relaxing that to the namespace
root user but so far implementing a more granular task_dumpable has not
been done.

The directories are world readable so they don't matter.

What puzzles me is that you have directories owned by nobody, and you
are talking about uid = 99 and gid = 99. Nobody is traditionally
(u16_t)-2 and there should never actually be used by anyone. And is
used as the default number of unmapped uids and gids.

It looks like you are doing something weird with nobody so I don't have
a clue what is actually going on.

Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/