Re: [RFC]Pid conversion between pid namespace

From: Serge Hallyn
Date: Thu Aug 07 2014 - 12:12:24 EST


Quoting chenhanxiao@xxxxxxxxxxxxxx (chenhanxiao@xxxxxxxxxxxxxx):
> Hi,
>
> > -----Original Message-----
> > From: Serge Hallyn [mailto:serge.hallyn@xxxxxxxxxx]
> > Sent: Tuesday, August 05, 2014 6:21 AM
> >
> > Quoting chenhanxiao@xxxxxxxxxxxxxx (chenhanxiao@xxxxxxxxxxxxxx):
> > > Hi,
> > >
> > > We discussed two ways of pid conversion:
> > > syscall and procfs.
> > >
> > > Both of them could do a pid translation job.
> > > But for ns hierarchy, syscall like:
> > >
> > > pid_t* getnspid(pid_t query_pid, pid_t observer_pid)
> > > or
> > > pid_t getnspid(pid_t query_pid, int query_fd, int ref_fd)
> > >
> > > could not work, we knew a pid lived in one ns, but we
> >
> > Note I still disagree here.
> >
> > > did not know their relationships.
> > > For getting the entire set of pids, both of them can do.
> > >
> > > So using procfs is a better way.
> > >
> > > Ex:
> > > init_pid_ns ns1 ns2
> > > t1 2
> > > t2 `- 3 1
> > > t3 `- 4 `- 5 1
> > > t4 `-6 `-8 `-9
> > > t5 `-10 `-9 `-10
> > >
> > > 1. How procfs work:
> > > a) adding a nspid hierarchy under /proc/ like:
> > > [root@localhost proc]# tree /proc/nspid
> > > /proc/nspid
> > > âââ ns0
> > > â âââ ns1
> >
> > Are these actually called 'ns1' etc? Adding a namespace of pid
> > namespace names is a bad thing.
>
> That's just an example.
> We incline to name it as ns$(inum),
> like what we did in proc_ns_readlink.
>
> >
> > > â âââ ns2
> > > â â âââ pid -> /proc/9/ns
> > > â âââ pid -> /proc/4/ns
> > > âââ pid -> /proc/1/ns
> > >
> > > We created dirs and add a link to the 1st process of this ns.
> >
> > How much more kernel space does this take up?
> >
>
> Only first process when creating new ns will be add here.
> So there would not so many items.

Oh, I see.

> > Is there an easy way to go from a pid in your own namespace
> > to its proper node under /proc/nspid? I.e. if I am interested
> > in pid 9987, which happens to be pid 5 inside a container in
> > ns2, and then I want to know what it means when it (pid 9987)
> > is talking about 'pid 10'. Is there a link under /proc/9987/
> > leading to /proc/nspid/ns2/5 ?
>
> If you want to query pid 9987, you could:
> a) readlink /proc/9987/ns/pid
> b) refer to /proc/nspid/ns$(inum)/ns$(inum)..
> c) Also the link to the 1st new ns process could be found under ns$(inum).

This is good. Let's go with it.

> Or as what you said above,

Nah. Let's not change /proc/PID/ns/pid.

> we could do some change in /proc/PID/ns/pid
> a) when new ns created, we put them under /proc/nspid
> b) create a link from /proc/PID/ns/pid to /proc/nspid/ns$(inum)/pid
>
> Then we could get a more clear view:
> 1. pidns view
> /proc/nspid
> âââ ns_4026531836 (ns0)
> â ââ ns1
> â â ââââ ns2
> â â âââ pid -> pid:[4026531836]
> â âââ pid -> pid:[4026531816]
> âââ pid -> pid:[4026531806]
>
> Then there will be a link under /proc/9987/ns/pid to ns2:
> 2. PID1 live in ns0, PID2 live in ns2
> /proc/PID1/ns/pid->/proc/nspid/ns_4026531806
>
> /proc/PID2/ns/pid->/proc/nspid/ns_4026531836
>
> >
> > > b) expose all sets of pid, pgid, sid and tgid
> > > via expanded /proc/PID/status
> > > We could get translated IDs from container like:
> > > NStgid: 6 8 9
> > > NSpid: 6 8 9
> > > NSpgid: 6 8 9
> > > NSsid: 6 1 0
> > > (a set of IDs with 3 level of ns)
> >
> > This sure does seem the simplest route. But it actually still
> > does not provide us an easy answer to "what does pid 9987 mean
> > when it talks about pid 10?".
>
> Do you mean:
> init_pid_ns ns1 ns2
> 9987 10 5
> Neither getnspid syscall nor proc/PID/status expansion
> could answer this without hierarchy information.
> For users in init_pid_ns, getnspid needs
> an observer pid live and only live in ns1,

Yes, good point. That's a definite disadvantage of getnspid
compared to your proc approach.

> or we should call getnspid in ns1.
> See below for more.
>
> >
> > > 2. Advantage of procfs solution
> > > a) easy to use:
> > > getnspid(6, 10) -> (10, 9, 10)
> > > or
> > > getnspid(10, ns1_fd, ns0_fd) -> 9
> > > getnspid(10, ns2_fd, ns0_fd) -> 10
> > >
> > > And we could also get it by:
> > > cat /proc/10/status | grep NSpid:
> > > NSpid: 10 9 10
> > > ...
> >
> > It looks nice, but I'm not convinced it gives us the info we
> > need.
> >
> > It's certainly possible that I've just not thought it through
> > enough.
> >
> > Question: are you proposing this (/proc/pid/status expansion) as an
> > alternative to /proc/nspid, or are they meant to be complementary?
> >
>
> We want /proc/nspid as a complement for pid translation.

Ok.

> Ex:
> init_pid_ns ns1 ns2
> t1 2
> t2 `- 3 1
> t3 `- 4 `- 5 1
> t4 `-6 `-8 `-9
> t5 `-10 `-9 `-10
> Suppose we were in init_pid_ns:
> getnspid(9,4)->6 (t4)
> getnspid(9,3)->10(t5)
> We knew t2 in ns1 and t3 in ns2, but we don't know their relationship.
> If we want to query pid 9 in ns1, we could use getnspid(9,3)->10(t5)
> but the pre-requisite is that we know ns2 is the child of ns1.

I like your proc approach. Do you have an implementation?

-serge
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/