Re: [PATCH 0/1] devpts: use dynamic_dname() to generate proc name

From: Christian Brauner
Date: Thu Aug 24 2017 - 19:37:47 EST


On Thu, Aug 24, 2017 at 06:01:36PM -0500, Eric W. Biederman wrote:
> Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> writes:
>
> > On Thu, Aug 24, 2017 at 1:43 PM, Eric W. Biederman
> > <ebiederm@xxxxxxxxxxxx> wrote:
> >>
> >> There are just enough weird one off scripts like xen image builder (I
> >> think that was the nasty test case that broke in debian) that I can't
> >> imagine ever being able to responsibly remove the path based lookups in
> >> /dev/ptmx. I do dream of it sometimes.
> >
> > Not going to happen.
>
> Which is what I said.
>
> > The fact is, /dev/ptmx is the simply the standard location.
> > /dev/pts/ptmx simply is *not*.
>
> The standard is posix_openpt(). That is a syscall on the bsds.
> Opening something called ptmx at this point is a Linuxism.
>
> There are a lot of programs that are going to be calling posix_openpt()
> simply because /dev/ptmx can not be counted on to exist.
>
> > So pretty much every single user that ever uses pty's will use
> > /dev/ptmx, it's just how it has always worked.
> >
> > Trying to change it to anything else is just stupid. There's no
> > upside, there is only downsides - mainly the "we'll have to support
> > the standard way anyway, that newfangled way doesn't add anything".
>
> Except the new fangled way does add quite a bit. Not everyone who
> mounts devpts has permission to call mknod. So /dev/ptmx frequently
> winds up either being a bind mount or a symlink to /dev/pts/ptmx in
> containers.

In fact, /dev/ptmx being a symlink or bind-mount is the *standard* in containers
even for non-user namespaced containers or containers that do not retain
CAP_MKNOD.

>
> It is going to take a long time but device nodes like one of those
> filesystem features thare are very slowly on their way out.

This related to the point above: The fact that we can mount a devpts at its
standard location but are unable to also have/create an additional device node
at the *standard location* is usually quite irritating for people who do not
know about this "legacy" behaviour. But yeah, it's probably going away but
that's going to be a long long time. I agree that userspace is the place to
slowly make the transition though. :)

>
> > Our "pts" lookup isn't expensive.
> >
> > So quite frankly, we should discourage people from using the
> > non-standard place. It really has no real advantages, and it's simply
> > not worth it.
>
> The "pts" lookup admitted isn't runtime expensive. I could propbably
> measure a cost but anyone who is creating ptys fast enough to care
> likely has other issues.
>
> The "pts" lookup does have some real maintenance costs as it takes
> someone with a pretty deep understanding of things to figure out what is
> going on. I hope things have finally been abstracted well enough, and
> the code is used heavily enough we don't have to worry about a
> regression there. I still worry.
>
> As for non-standard locations. Anything that isn't /dev/ptmx and
> /dev/pts/NNN simply won't work for anything isn't very specialized.

I was mainly asking about non-standard locations because I experienced weird
behaviour when trying to open("/mnt/<slave-idx", O_RDWR | O_NOCTTY). Mind you I
did all the steps that grantpt() + unlockpt() usually do purely file descriptor
based. But I think this was due to the faulty TIOCGPTPEER implemenation before
which should now be fixed.

> At which point I don't think there is any reason to skip using the ptmx
> node on the devpts filesystem as you have already given up compatibility
> with everything else.
>
> But I agree it doesn't look worth it to change glibc to deal with an
> alternate location for /dev/ptmx. I see a huge point in changing glibc
> to use the new TIOCGPTPEER ioctl when available as that is really the
> functionality the glibc internals are after.

That's a patch I've been looking into. But TIOCGPTPEER alone won't be enough. A
couple of other function such as grantpt() need to switch from path-based
operation to file descriptor based operations too (Something I tried to point
out in one of my previous mails.). The whole user-space api could do - imho -
with a redo. The kernel is doing the right thing and exposing the right bits
mostly; TIOCGPTPEER being a good step. But user-space wise it's actually a
little security nightmare as soon as namespaces and - sorry for the buzzword -
*containers* come into play. @Eric, are you going to be at Plumbers again this
year? That's maybe a good chance to discuss some of this if there's still
interest.

Christian

>
> Eric