Re: uid=0 inside user-namespace and procfs file permissions

From: Aditya Kali
Date: Tue Sep 30 2014 - 20:51:51 EST


On Tue, Sep 30, 2014 at 5:35 PM, Eric W. Biederman
<ebiederm@xxxxxxxxxxxx> wrote:
> Aditya Kali <adityakali@xxxxxxxxxx> writes:
>
>> Hi all,
>>
>> I am trying to run a process with uid=0 inside userns. But in the when
>> I also do capset() after setresuid(0, 0, 0), I am seeing inconsistent
>> proc file permissions. Almost all the files in /proc/<pid>/ has global
>> 'root' as owner and group even if the actual process uid is correctly
>> changed.
>>
>> I wrote a simple program that demonstrate the issue:
>>
>> 1. parent, as global root (uid=0 in init_user_ns) fork()s a child
>> 2. child:
>> a) unshare(CLONE_NEWUSER)
>> b) [wait for parent to write uid_map]
>> c) setresgid(id, id, id) ; setresuid(0, 0, 0);
>> d) conditionally call capset() to clear capabilities
>> e) execve(/bin/sleep)
>> 3. parent:
>> a) populates child's uid_map and maps some uid to 0 inside userns. ex:
>> 0 99 1
>> b) waitpid()
>>
>> (the actual program can be found at http://pastebin.com/f4P17VFn for
>> your reference).
>>
>> When there is no capset() call after setresuid(0,0,0), everything is
>> fine. But when I do a capset() to clear all capabilities, the 'owner'
>> and 'group' of all the files under /proc/<child_pid>/ of the child
>> process are reverted to global 'root' user.
>>
>> # without capset (2.d):
>> root@vm1# id
>> uid=0(root) gid=0(root) groups=0(root)
>>
>> root@vm1# ./userns_uid0
>> child_pid: 24277
>> proc_file: /proc/24277/uid_map
>> proc_file: /proc/24277/gid_map
>> child resuming
>>
>> ^Z
>> [1]+ Stopped ./userns_uid0
>> root@vm1# cat /proc/24277/uid_map
>> 0 99 1
>> root@vm1# cat /proc/24277/status | grep -e "Uid:" -e "Gid:"
>> Uid: 99 99 99 99
>> Gid: 99 99 99 99
>> root@vm1# ls -l /proc/24277/
>> total 0
>> dr-xr-xr-x 2 nobody nobody 0 2014-09-30 16:31 attr
>> -r-------- 1 nobody nobody 0 2014-09-30 16:31 auxv
>> -r--r--r-- 1 nobody nobody 0 2014-09-30 16:31 cgroup
>> --w------- 1 nobody nobody 0 2014-09-30 16:31 clear_refs
>> -r--r--r-- 1 nobody nobody 0 2014-09-30 16:31 cmdline
>> -rw-r--r-- 1 nobody nobody 0 2014-09-30 16:31 comm
>> -rw-r--r-- 1 nobody nobody 0 2014-09-30 16:31 coredump_filter
>> -r--r--r-- 1 nobody nobody 0 2014-09-30 16:31 cpuset
>> ...
>> [All files have owner='nobody' and group='nobody' .. same as that of
>> the process]
>>
>> With the additional capset() call, the files under /proc/<child_pid>/
>> are now owned by global root:
>>
>> root@vm1# ./userns_uid0 resetcaps
>> child_pid: 24706
>> proc_file: /proc/24706/uid_map
>> proc_file: /proc/24706/gid_map
>> child resuming
>> resetting caps
>> ^Z
>> [2]+ Stopped ./userns_uid0 resetcaps
>> root@vm1# cat /proc/24706/uid_map
>> 0 99 1
>> root@vm1# cat /proc/24706/status | grep -e "Uid:" -e "Gid:"
>> Uid: 99 99 99 99
>> Gid: 99 99 99 99
>>
>> [Everything as before till now]
>>
>> root@vm1# ls -l /proc/24706/
>> total 0
>> dr-xr-xr-x 2 nobody nobody 0 2014-09-30 16:47 attr
>> -r-------- 1 root root 0 2014-09-30 16:47 auxv
>> -r--r--r-- 1 root root 0 2014-09-30 16:47 cgroup
>> --w------- 1 root root 0 2014-09-30 16:47 clear_refs
>> -r--r--r-- 1 root root 0 2014-09-30 16:47 cmdline
>> -rw-r--r-- 1 root root 0 2014-09-30 16:47 comm
>> -rw-r--r-- 1 root root 0 2014-09-30 16:47 coredump_filter
>> -r--r--r-- 1 root root 0 2014-09-30 16:47 cpuset
>> ...
>> -r--r--r-- 1 root root 0 2014-09-30 16:47 mountinfo
>> -r--r--r-- 1 root root 0 2014-09-30 16:47 mounts
>> -r-------- 1 root root 0 2014-09-30 16:47 mountstats
>> dr-xr-xr-x 5 nobody nobody 0 2014-09-30 16:47 net
>> dr-x--x--x 2 root root 0 2014-09-30 16:47 ns
>> -r--r--r-- 1 root root 0 2014-09-30 16:47 numa_maps
>> ...
>> -r--r--r-- 1 root root 0 2014-09-30 16:47 status
>> -r-------- 1 root root 0 2014-09-30 16:47 syscall
>> dr-xr-xr-x 3 nobody nobody 0 2014-09-30 16:47 task
>> ..
>>
>> Only the directories 'attr', 'net' and 'task' are owned by the uid=99.
>> Rest all files are owned by global root.
>>
>> This behavior seems inconsistent. I ran this on 3.17 kernel. Can
>> someone with expertise in this area explain if this is expected?
>
> So I am not quite certain what you are seeing.
>
> In general proc files are expected to be owned by the euid of a process.
> However when the task_dumpable is cleared the files become owned by the
> global root user. We have considered relaxing that to the namespace
> root user but so far implementing a more granular task_dumpable has not
> been done.
>

I tried explicitly setting PR_SET_DUMPABLE before execve(), but that
didn't either.

> The directories are world readable so they don't matter.
>
> What puzzles me is that you have directories owned by nobody, and you
> are talking about uid = 99 and gid = 99. Nobody is traditionally
> (u16_t)-2 and there should never actually be used by anyone. And is
> used as the default number of unmapped uids and gids.
>
> It looks like you are doing something weird with nobody so I don't have
> a clue what is actually going on.
>

The issue is not specific to uid 99 or "nobody". Its just a dummy user
I have for testing. The issue happens with any user with non-zero uid.


> Eric

Thanks,
--
Aditya
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/