Re: [PATCH RFC v3 2/2] pidns: introduce syscall getvpid

From: Konstantin Khlebnikov
Date: Tue Oct 20 2015 - 06:04:36 EST


On 28.09.2015 19:57, Eric W. Biederman wrote:
Konstantin Khlebnikov <khlebnikov@xxxxxxxxxxxxxx> writes:

If pid is negative then getvpid() returns pid of parent task for -pid.

Now that I am noticing this. I don't think I have seen any discussion
about justifying a syscall getting another processes parent pid. My
apologies if I just missed it.


Sorry for late response. This completely fell out of my mind after LinuxCon.

Why do we want the the parent pid? We can we usefully do with it?
Is proc really that bad of an interface?

Fetching a parent pid feels like a separate logical operation
from pid translation. Which makes me a bit uneasy about this
part of the conversation.

Yep proc interface is bad. /proc/$pid/stat is almost impossible to
parse without flaws because task could set second field "comm" into
any string and fake ppid - for example ") Z 1". /proc/$pid/status
is better but it has more information and thus slower.

This trick for distant getppid looks cheap useful:
in this interface space of negative pids is free for use.


Examples:
getvpid(pid, ns, -1) - get pid in our pid namespace
getvpid(pid, -1, ns) - get pid in container
getvpid(pid, -1, ns) > 0 - is pid is reachable from container?
getvpid(1, ns1, ns2) > 0 - is ns1 inside ns2?
getvpid(1, ns1, ns2) == 0 - is ns1 outside ns2?
getvpid(1, ns, -1) - get init task of pid-namespace
getvpid(-1, ns, -1) - get reaper of init task in parent pid-namespace
getvpid(-pid, -1, -1) - get ppid by pid

As I step back and pay attention to this case I am half wondering if
perhaps what would be most useful is a file descriptor that refers
to a pid and an updated set of system calls that takes pid file
descriptors instead of pids.

Fd which pins pids isn't a good idea.

I think it's better to refer (but not hold) task rather than pid.
For example inode of taskfd will hold small buffer for task exit
status: task holds reference to its own taskfd inode and populates
status when exits. Here will be no zombies and delayed reaping.

Something like:

task_fd = clonefd()
...
select(...)
exit(...)
pread(task_fd, &status_rusage_etc, sizeof, 0);
close(task_fd);

Task pid also could be part of structure in that fd. Potentially it
could provide the same information as /proc/$pid/... in effective
binary format: we can read only required fields of structure and
kernel can skip unneeded calculations.


Something like:

getpidfd(int pidnsfd, pid_t pid);

waitfd(int pidfd, int *status, int options, struct rusage *rusage);

killfd(int pidfd, int sig);

clonefd(...);

And perhaps:
pid_nr_ns(int pidnsfd, int pidfd);

parentfd(int pidfd);

Eric


--
Konstantin
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/