Re: Odd interaction with file capabilities and procfs files

From: Daniel Xu
Date: Thu Oct 20 2022 - 17:36:23 EST


On Thu, Oct 20, 2022, at 1:44 AM, Christian Brauner wrote:
> On Wed, Oct 19, 2022 at 03:42:42PM -0600, Daniel Xu wrote:
>> Hi Christian,
>>
>> On Wed, Oct 19, 2022, at 7:22 AM, Christian Brauner wrote:
>> > On Tue, Oct 18, 2022 at 06:42:04PM -0600, Daniel Xu wrote:
>> >> Hi,
>> >>
>> >> (Going off get_maintainers.pl for fs/namei.c here)
>> >>
>> >> I'm seeing some weird interactions with file capabilities and S_IRUSR
>> >> procfs files. Best I can tell it doesn't occur with real files on my btrfs
>> >> home partition.
>> >>
>> >> Test program:
>> >>
>> >> #include <fcntl.h>
>> >> #include <stdio.h>
>> >>
>> >> int main()
>> >> {
>> >> int fd = open("/proc/self/auxv", O_RDONLY);
>> >> if (fd < 0) {
>> >> perror("open");
>> >> return 1;
>> >> }
>> >>
>> >> printf("ok\n");
>> >> return 0;
>> >> }
>> >>
>> >> Steps to reproduce:
>> >>
>> >> $ gcc main.c
>> >> $ ./a.out
>> >> ok
>> >> $ sudo setcap "cap_net_admin,cap_sys_admin+p" a.out
>> >> $ ./a.out
>> >> open: Permission denied
>> >>
>> >> It's not obvious why this happens, even after spending a few hours
>> >> going through the standard documentation and kernel code. It's
>> >> intuitively odd b/c you'd think adding capabilities to the permitted
>> >> set wouldn't affect functionality.
>> >>
>> >> Best I could tell the -EACCES error occurs in the fallthrough codepath
>> >> inside generic_permission().
>> >>
>> >> Sorry if this is something dumb or obvious.
>> >
>> > Hey Daniel,
>> >
>> > No, this is neither dumb nor obvious. :)
>> >
>> > Basically, if you set fscaps then /proc/self/auxv will be owned by
>> > root:root. You can verify this:
>> >
>> > #include <fcntl.h>
>> > #include <sys/types.h>
>> > #include <sys/stat.h>
>> > #include <stdio.h>
>> > #include <errno.h>
>> > #include <unistd.h>
>> >
>> > int main()
>> > {
>> > struct stat st;
>> > printf("%d | %d\n", getuid(), geteuid());
>> >
>> > if (stat("/proc/self/auxv", &st)) {
>> > fprintf(stderr, "stat: %d - %m\n", errno);
>> > return 1;
>> > }
>> > printf("stat: %d | %d\n", st.st_uid, st.st_gid);
>> >
>> > int fd = open("/proc/self/auxv", O_RDONLY);
>> > if (fd < 0) {
>> > fprintf(stderr, "open: %d - %m\n", errno);
>> > return 1;
>> > }
>> >
>> > printf("ok\n");
>> > return 0;
>> > }
>> >
>> > $ ./a.out
>> > 1000 | 1000
>> > stat: 1000 | 1000
>> > ok
>> > $ sudo setcap "cap_net_admin,cap_sys_admin+p" a.out
>> > $ ./a.out
>> > 1000 | 1000
>> > stat: 0 | 0
>> > open: 13 - Permission denied
>> >
>> > So acl_permission_check() fails and returns -EACCESS which will cause
>> > generic_permission() to rely on capable_wrt_inode_uidgid() which checks
>> > for CAP_DAC_READ_SEARCH which you don't have as an unprivileged user.
>>
>> Thanks for checking on this.
>>
>> That does explain explain the weirdness but at the expense of another
>> question: why do fscaps cause /proc/self/auxv to be owned by root?
>> Is that the correct semantics? This also seems rather unexpected.
>>
>> I'll take a look tonight and see if I can come up with any answers.
>
> Sorry I didn't explain this in more detail.
> You mostly uncovered the reasons as evidenced by the Twitter thread.
>
> Yes, this is expected. When a new process that gains privileges during
> exec the kernel will make it non-dumpable. That includes changing of the
> e{g,u}id or fs{g,u}id of the process, s{g,u}id binary execution that
> results in changed e{g,u}id, or if the executed binary has fscaps set if
> the new permitted caps aren't a subset of the currently permitted caps.
>
> The last reason is what causes your sample program's /proc/self to be
> owned by root. The culprit here is cred_cap_issubset() which is called
> during commit_creds() in begin_new_exec().
>
> If the dumpable attribute is set then all files in /proc/<pid> will be
> owned by (userns) root. To get the full picture you'd need to at least
> read man proc(5), man execve(2), and man prctl(2).
>
> The reason behind the dumpability change is to prevent unprivileged user
> to make privilege-elevating-binaries (e.g., s{g,u}id binaries) crash to
> produce (userns-)root-owned coredumps which can be used in exploits. A
> fairly recent example of this is e.g.,
> https://alephsecurity.com/2021/10/20/sudump/
> https://www.openwall.com/lists/oss-security/2021/10/20/2

Thanks for the detailed explanation! I think each sense makes sense to
me now. Even if the final result is a little odd. One of those things I guess
:).

I'll see if a patch to the man-pages is appropriate.

Thanks,
Daniel