Re: [PATCH v2 0/5] pid: add pidfd_open()

From: Jann Horn
Date: Mon Apr 01 2019 - 09:43:44 EST


On Mon, Apr 1, 2019 at 2:04 PM Christian Brauner <christian@xxxxxxxxxx> wrote:
> On Sun, Mar 31, 2019 at 08:13:38PM -0600, Andy Lutomirski wrote:
> > > On Mar 31, 2019, at 3:17 PM, Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
> > >> On Sun, Mar 31, 2019 at 2:10 PM Christian Brauner <christian@xxxxxxxxxx> wrote:
> > >>
> > >> I don't think that we want or can make them equivalent since that would
> > >> mean we depend on procfs.
> > >
> > > Sure we can.
> > >
> > > If /proc is enabled, then you always do that dance YOU ALREADY WROTE
> > > THE CODE FOR to do the stupid ioctl.
> > >
> > > And if /procfs isn't enabled, then you don't do that.
> > >
> > > Ta-daa. Done. No stupid ioctl, and now /proc and pidfd_open() return
> > > the same damn thing.
> > >
> > > And guess what? If /proc isn't enabled, then obviously pidfd_open()
> > > gives you the /proc-less thing, but at least there is no crazy "two
> > > different file descriptors for the same thing" situation, because then
> > > the /proc one doesn't exist.
> > >
> >
> > I wish we could do this, and, in a clean design, it would be a no-brainer. But /proc has too much baggage. Just to mention two such things, thereâs ânetâ and â../sysâ. This crud is why we have all kinds of crazy rules that prevent programs in sandboxes from making a new mounts and mounting /proc in it. If we make it possible to clone a new process and this access /proc without having /proc mounted, weâll open up a big can of worms.
> >
> > Maybe we could have a sanitized view of /proc and make a pidfd be a directory fd pointing at that.
>
> We can also just create something like an internal bind-mount without a
> parent, i.e. similar to
>
> open_tree(<internal-procfs-mount>, "<pid>", OPEN_TREE_CLONE);
>
> on a clone(CLONE_PIDFD);
>
> that would block any openat(fd, "..");

Or we add a check to follow_dotdot()/follow_dotdot_rcu() that throws
an error if nd->path.mnt->mnt_flags has some new flag for "no dotdot
traversal on this mountpoint", and then set that on the internal procfs
mount... if Al Viro doesn't think that that's too hideous.