RE: [RFC]Pid conversion between pid namespace

From: chenhanxiao@xxxxxxxxxxxxxx
Date: Mon Jul 21 2014 - 06:47:58 EST


Hi,

> -----Original Message-----
> From: Serge Hallyn [mailto:serge.hallyn@xxxxxxxxxx]
> Sent: Tuesday, July 15, 2014 12:16 PM
> To: Chen, Hanxiao/陈 晗霄
> Subject: Re: [RFC]Pid conversion between pid namespace
> > A-2) syscall pid_t getnspid(pid_t query_pid, pid_t observer_pid)
> > pros:
> > - ns procfs free, easy to use.
> > We could get rid of mounted ns procfs.
> >
> > cons:
> > - may find multiple results in nested ns.
> > We wished the new API could tell us the exact answer.
> > But if getnspid return more than one results will bring trouble to admins,
>
> (See below for more, but) the question being posed to getnspid has precisely
> one answer.
>
> > they had to make another decision.
> > Or we marked the deepest level for translation as prerequisite.
> >
> > -based on current pidns, no reference ns.
>
> Hm, no. The intent here was that
>
> observer_pid would be in current ns
> query_pid would be in observer_pid's ns.
>
> So this would be ideal for "I got a pid in a logfile created by rsyslog in
> a nested contaner, what is the logged pid in my pidns."
>
> Taking a set of tasks (like a container with nesting) and bulding a tree
> of all pids shouldn't be too difficult either. Start with the init pid,
> call getnspid($pid, $init_pid) for every $pid in the container; to figure
> out whether any $pid is itself a nested init_pid, we can compare the
> /proc/$$/ns/pid, as well as look at getnspid($pid, $pid).
I'm a little confused in this section:

Ex:
init_pid_ns ns1 ns2
t1 2
t2 `- 3 1
t3 `- 4 `- 5 1
t4 `-6 `-8 `-9
t5 `-10 `-9 `-10

For getnspid($pid, $init_pid),
Does init_pid means container's init_pid such as 3 for t2?

In nested containers, does this syscall work as:
getnspid(9, 4) -> (6, 8, 9)
9 in ns2, 4 as t3 in init_pid_ns(current ns)

And:
getnspid($pid, $pid)
If pid in host and pid in container is the same by coincidence:
getnspid(10,10) for t5, it may not work.

Thanks,
- Chen
>
> > B) make/change proc file/directories
> > B-1) expand /proc/pid/status
> > pros:
> > - easy to use and to debug
> > - already had existed interface in kernel
> >
> > cons:
> > - based on current ns
> > for middle level, we had to make another decision.
> > - do not have hierarchy info.
> >
> > B-2) /proc/<pidX>/ns/proc/ which would contain everything
> > pros:
> > - have enough info from /proc in container
> >
> > cons:
> > - Requirements unclear.
> > We need more discussion to decide which items should not be exposed.
> > - do not have hierarchy info.
> >
> >
> > How about do these things in two steps:
> >
> > C) 1. expose all sets of pid, pgid, sid and tgid
> > via expanded /proc/PID/status
> > We could get translated IDs from container like:
> > NStgid: 16465 5 1
> > NSpid: 16465 5 1
> > NSpgid: 16465 5 1
> > NSsid: 16423 1 0
> > (a set of IDs with 3 level of ns)
> >
> > 2. add hierarchy info under /proc
> > We lacked of method of getting hierarchy info, which is useful.
> > Then we could know the relationship of ns.
> > How about adding a new proc file just under /proc
> > to show the hierarchy like readlink did:
> > pid:[4026531836]-> [4026532390] -> [4026532484]
> > pid:[4026531836]-> [4026532491]
> > (A 3 level pid and 2 level pid_
> >
> > Any comments would be appreciated.
> >
> > Thanks,
> > - Chen
> >
> > > -----Original Message-----
> > > Subject: [RFC]Pid conversion between pid namespace
> > >
> > > Hi,
> > >
> > > We had some discussions on how to carry out
> > > pid conversion between pid namespace via:
> > > syscall[1] and procfs[2].
> > >
> > > Pavel suggested that a syscall like
> > > (ID, NS1, NS2) into (ID).
> > >
> > > Serge suggested that a syscall
> > > pid_t getnspid(pid_t query_pid, pid_t observer_pid).
> > >
> > >
> > > Eric and Richard suggested a procfs solution is
> > > more appropriate.
> > >
> > > Oleg suggested that we should expand /proc/pid/status
> > > to report this kind of information.
> > >
> > > And Richard suggested adding a directory like
> > > /proc/<pidX>/ns/proc/ which would contain everything
> > > from /proc/<pidX inside the namespace>/.
> > >
> > > As procfs provided a more user friendly interface,
> > > how about expose all sets of tgid, pid, pgid, sid
> > > by expanding /proc/PID/status in procfs?
> > > And we could also expose ns hierarchy under /proc,
> > > which could be another reference.
> > >
> > > Ex:
> > > init_pid_ns ns1 ns2
> > > t1 2
> > > t2 `- 3 1
> > > t3 `- 4 `- 5 1
> > >
> > > We could get in /proc/t3/status:
> > > NSpid: 4 5 1
> > > We knew that pid 1 in container is pid 4 in init ns.
> > >
> > > And we could get ns hierarchy under /proc/ns_hierarchy like:
> > > init_ns->ns1->ns2 (as the result of readlink)
> > > ->ns3
> > > We knew that t3 in ns2, and its hierarchy.
> > >
> > > How these ideas looks like?
> > > Any comments would be appreciated.
> > >
> > > Thanks,
> > > - Chen
> > >
> > >
> > > a) syscall
> > > http://lwn.net/Articles/602987/
> > >
> > > b) procfs
> > > http://www.spinics.net/lists/kernel/msg1751688.html
> > >
> > > _______________________________________________
> > > Containers mailing list
> > > Containers@xxxxxxxxxxxxxxxxxxxxxxxxxx
> > > https://lists.linuxfoundation.org/mailman/listinfo/containers
> > _______________________________________________
> > Containers mailing list
> > Containers@xxxxxxxxxxxxxxxxxxxxxxxxxx
> > https://lists.linuxfoundation.org/mailman/listinfo/containers
N?叉??y??b??千v??藓{.n???{?赙zXФ?塄}?财??j:+v???赙zZ+€?zf"?????i????ア??璀??撷f?^j谦y??@A?囤?0鹅h??i