Re: [PATCH v2 1/3] namei: implement O_BENEATH-style AT_* flags

From: Aleksa Sarai
Date: Sat Oct 27 2018 - 11:37:51 EST


On 2018-10-27, Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote:
> On Sat, Oct 27, 2018 at 06:17:29PM +1100, Aleksa Sarai wrote:
>
> > I'm going to send out a v4 "soon" but I would like to know what folks
> > think about having resolveat(2) (or similar) to separate the scoping O_*
> > flags and produce an O_PATH -- since unsupported O_* flags are ignored
> > by older kernels userspace will have to do some plenty of checking after
> > each path operation.
> >
> > Personally, I believe this (along with AT_EMPTY_PATH for openat(2))
> > would help with some other O_PATH issues.
>
> The trouble with resolveat(2) is that for anything directory-modifying
> you really want directory locked before the lookup for last component.
> IOW, funlink(2) et.al. are hopeless.

Ah, right... In those cases I think that AT_SYMLINK_NOFOLLOW could help,
or maybe we would need to have some of the scoping flags for other
syscalls (though this would be an issue in either case for scoping
unlinkat(2) -- even if we used O_BENEATH). :/

But my main issue at the moment with O_PATH is that /proc/self/fd/...
re-opening allows for some very odd delayed-access-check attacks.
openat(2) doesn't give you an O_EMPTYPATH but that is what you get with
/proc.

For instance, take /proc/self/exe. Tautologically, it is impossible to
open it O_RDWR -- if you are resolving it through an "exe" magic symlink
then it is being used as a process's ->mm (and thus is locked from
writing). *However* you can open it O_PATH and then later re-open it
through /proc/self/fd. We had cases where a container runtime joining a
container would be able to do this and overwrite the container binary
*on the host*. This has been mitigated now (as part of CVE-2016-9962),
but I would be very shocked if there was no other places where this sort
of thing would happen.

My proposal for resolveat(2) would let us have some sort of "I want
these access bits" API for O_PATH. Of course there are some quite
not-nice changes I think you'd need to allow for this usecase -- my
back-of-the-envelope proposal would be to stash away the fmode bits
inside 'struct path' so that do_last() can see whether we are doing a
re-open of an existing 'struct file' (but I'm sure this sounds awful).

Is this a problem you think deserves solving / is there a better way of
going about it? Thanks.

--
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH
<https://www.cyphar.com/>

Attachment: signature.asc
Description: PGP signature