Re: [PATCH RFC] pidns: introduce syscall getvpid

From: Serge Hallyn
Date: Tue Sep 15 2015 - 13:41:52 EST


Quoting Stéphane Graber (stgraber@xxxxxxxxxx):
> On Tue, Sep 15, 2015 at 06:01:38PM +0300, Konstantin Khlebnikov wrote:
> > On 15.09.2015 17:27, Eric W. Biederman wrote:
> > >Konstantin Khlebnikov <khlebnikov@xxxxxxxxxxxxxx> writes:
> > >
> > >>pid_t getvpid(pid_t pid, pid_t source, pid_t target);
> > >>
> > >>This syscall converts pid from one pid-ns into pid in another pid-ns:
> > >>it takes @pid in namespace of @source task (zero for current) and
> > >>returns related pid in namespace of @target task (zero for current too).
> > >>If pid is unreachable from target pid-ns then it returns zero.
> > >
> > >This interface as presented is inherently racy. It would be better
> > >if source and target were file descriptors referring to the namespaces
> > >you wish to translate between.
> >
> > Yep, it's racy. As well as any operation with non-child pids.
> > With file descriptors for source/target result will be racy anyway.
> >
> > >
> > >>Such conversion is required for interaction between processes from
> > >>different pid-namespaces. For example when system service talks with
> > >>client from isolated container via socket about task in container:
> > >
> > >Sockets are already supported. At least the metadata of sockets is.
> > >
> > >Maybe we need this but I am not convinced of it's utility.
> > >
> > >What are you trying to do that motivates this?
> >
> > I'm working on hierarchical container management system which
> > allows to create and control nested sub-containers from containers
> > ( https://github.com/yandex/porto ). Main server works in host and
> > have to interact with all levels of nested namespaces. This syscall
> > makes some operations much easier: server must remember only pid in
> > host pid namespace and convert it into right vpid on demand.
>
> Note that as Eric said earlier, sending a PID inside a ucred through a
> unix socket will have the pid translated.
>
> So while your solution certainly should be faster, you can already achieve
> what you want today by doing:
>
> == Translate PID in container to PID in host
> - open a socket
> - setns to container's pidns
> - send ucred from that container containing the requested container PID
> - host sees the host PID
>
> == Translate PID on host to PID in container
> - open a socket
> - setns to container's pidns
> - send ucred from the host containing the request host PID
> (send will fail if the host PID isn't part of that container)
> - container sees the container PID

In addition, since commit e4bc332451 : /proc/PID/status: show all sets of pid according to ns
we now also have 'NSpid' etc in /proc/$$/status.

-serge
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/