Re: Documenting the ioctl interfaces to discover relationships between namespaces

From: Andrei Vagin
Date: Thu Dec 15 2016 - 07:26:58 EST


On Sun, Dec 11, 2016 at 12:54:56PM +0100, Michael Kerrisk (man-pages) wrote:
> [was: [PATCH 0/4 v3] Add an interface to discover relationships
> between namespaces]
>
> Hello Andrei
>
> See below for my attempt to document the following.

Hi Michael,

Eric already did my work:). I have read this documentation and it looks
good for me. I have nothing to add to Eric's comments.

Thanks,
Andrei

>
> On 6 September 2016 at 09:47, Andrei Vagin <avagin@xxxxxxxxxx> wrote:
> > From: Andrey Vagin <avagin@xxxxxxxxxx>
> >
> > Each namespace has an owning user namespace and now there is not way
> > to discover these relationships.
> >
> > Pid and user namepaces are hierarchical. There is no way to discover
> > parent-child relationships too.
> >
> > Why we may want to know relationships between namespaces?
> >
> > One use would be visualization, in order to understand the running
> > system. Another would be to answer the question: what capability does
> > process X have to perform operations on a resource governed by namespace
> > Y?
> >
> > One more use-case (which usually called abnormal) is checkpoint/restart.
> > In CRIU we are going to dump and restore nested namespaces.
> >
> > There [1] was a discussion about which interface to choose to determing
> > relationships between namespaces.
> >
> > Eric suggested to add two ioctl-s [2]:
> >> Grumble, Grumble. I think this may actually a case for creating ioctls
> >> for these two cases. Now that random nsfs file descriptors are bind
> >> mountable the original reason for using proc files is not as pressing.
> >>
> >> One ioctl for the user namespace that owns a file descriptor.
> >> One ioctl for the parent namespace of a namespace file descriptor.
> >
> > Here is an implementaions of these ioctl-s.
> >
> > $ man man7/namespaces.7
> > ...
> > Since Linux 4.X, the following ioctl(2) calls are supported for
> > namespace file descriptors. The correct syntax is:
> >
> > fd = ioctl(ns_fd, ioctl_type);
> >
> > where ioctl_type is one of the following:
> >
> > NS_GET_USERNS
> > Returns a file descriptor that refers to an owning user namesâ
> > pace.
> >
> > NS_GET_PARENT
> > Returns a file descriptor that refers to a parent namespace.
> > This ioctl(2) can be used for pid and user namespaces. For
> > user namespaces, NS_GET_PARENT and NS_GET_USERNS have the same
> > meaning.
> >
> > In addition to generic ioctl(2) errors, the following specific ones
> > can occur:
> >
> > EINVAL NS_GET_PARENT was called for a nonhierarchical namespace.
> >
> > EPERM The requested namespace is outside of the current namespace
> > scope.
> >
> > [1] https://lkml.org/lkml/2016/7/6/158
> > [2] https://lkml.org/lkml/2016/7/9/101
>
> The following is the text I propose to add to the namespaces(7) page.
> Could you please review and let me know of corrections and
> improvements.
>
> Thanks,
>
> Michael
>
>
> Introspecting namespace relationships
> Since Linux 4.9, two ioctl(2) operations are provided to allow
> introspection of namespace relationships (see user_namespaces(7)
> and pid_namespaces(7)). The form of the calls is:
>
> ioctl(fd, request);
>
> In each case, fd refers to a /proc/[pid]/ns/* file.
>
> NS_GET_USERNS
> Returns a file descriptor that refers to the owning user
> namespace for the namespace referred to by fd.
>
> NS_GET_PARENT
> Returns a file descriptor that refers to the parent namesâ
> pace of the namespace referred to by fd. This operation is
> valid only for hierarchical namespaces (i.e., PID and user
> namespaces). For user namespaces, NS_GET_PARENT is synonyâ
> mous with NS_GET_USERNS.
>
> In each case, the returned file descriptor is opened with O_RDONLY
> and O_CLOEXEC (close-on-exec).
>
> By applying fstat(2) to the returned file descriptor, one obtains
> a stat structure whose st_ino (inode number) field identifies the
> owning/parent namespace. This inode number can be matched with
> the inode number of another /proc/[pid]/ns/{pid,user} file to
> determine whether that is the owning/parent namespace.
>
> Either of these ioctl(2) operations can fail with the following
> error:
>
> EPERM The requested namespace is outside of the caller's namesâ
> pace scope. This error can occur if, for example, the ownâ
> ing user namespace is an ancestor of the caller's current
> user namespace. It can also occur on attempts to obtain
> the parent of the initial user or PID namespace.
>
> Additionally, the NS_GET_PARENT operation can fail with the folâ
> lowing error:
>
> EINVAL fd refers to a nonhierarchical namespace.
>
> See the EXAMPLE section for an example of the use of these operaâ
> tions.
>
> [...]
>
> EXAMPLE
> The example shown below uses the ioctl(2) operations described
> above to perform simple introspection of namespace relationships.
> The following shell sessions show various examples of the use of
> this program.
>
> Trying to get the parent of the initial user namespace fails, for
> the reasons explained earlier:
>
> $ ./ns_introspect /proc/self/ns/user p
> The parent namespace is outside your namespace scope
>
> Create a process running sleep(1) that resides in new user and UTS
> namespaces, and show that new UTS namespace is associated with the
> new user namespace:
>
> $ unshare -Uu sleep 1000 &
> [1] 23235
> $ ./ns_introspect /proc/23235/ns/uts
> Inode number of owning user namespace is: 4026532448
> $ readlink /proc/23235/ns/user
> user:[4026532448]
>
> Then show that the parent of the new user namespace in the precedâ
> ing example is the initial user namespace:
>
> $ readlink /proc/self/ns/user
> user:[4026531837]
> $ ./ns_introspect /proc/23235/ns/user
> Inode number of owning user namespace is: 4026531837
>
> Start a shell in a new user namespace, and show that from within
> this shell, the parent user namespace can't be discovered. Simiâ
> larly, the UTS namespace (which is associated with the initial
> user namespace) can't be discovered.
>
> $ PS1="sh2$ " unshare -U bash
> sh2$ ./ns_introspect /proc/self/ns/user p
> The parent namespace is outside your namespace scope
> sh2$ ./ns_introspect /proc/self/ns/uts u
> The owning user namespace is outside your namespace scope
>
> Program source
>
> /* ns_introspect.c
>
> Licensed under GNU General Public License v2 or later
> */
> #include <stdlib.h>
> #include <unistd.h>
> #include <stdio.h>
> #include <sys/stat.h>
> #include <fcntl.h>
> #include <sys/ioctl.h>
> #include <string.h>
> #include <errno.h>
>
> #ifndef NS_GET_USERNS
> #define NSIO 0xb7
> #define NS_GET_USERNS _IO(NSIO, 0x1)
> #define NS_GET_PARENT _IO(NSIO, 0x2)
> #endif
>
> int
> main(int argc, char *argv[])
> {
> int fd, userns_fd, parent_fd;
> struct stat sb;
>
> if (argc < 2) {
> fprintf(stderr, "Usage: %s /proc/[pid]/ns/[file] [p|u]\n",
> argv[0]);
> fprintf(stderr, "\nDisplay the result of one or both "
> "of NS_GET_USERNS (u) or NS_GET_PARENT (p)\n"
> "for the specified /proc/[pid]/ns/[file]. If neither "
> "'p' nor 'u' is specified,\n"
> "NS_GET_USERNS is the default.\n");
> exit(EXIT_FAILURE);
> }
>
> /* Obtain a file descriptor for the 'ns' file specified
> in argv[1] */
>
> fd = open(argv[1], O_RDONLY);
> if (fd == -1) {
> perror("open");
> exit(EXIT_FAILURE);
> }
>
> /* Obtain a file descriptor for the owning user namespace and
> then obtain and display the inode number of that namespace */
>
> if (argc < 3 || strchr(argv[2], 'u')) {
> userns_fd = ioctl(fd, NS_GET_USERNS);
>
> if (userns_fd == -1) {
> if (errno == EPERM)
> printf("The owning user namespace is outside "
> "your namespace scope\n");
> else
> perror("ioctl-NS_GET_USERNS");
> exit(EXIT_FAILURE);
> }
>
> if (fstat(userns_fd, &sb) == -1) {
> perror("fstat-userns");
> exit(EXIT_FAILURE);
> }
> printf("Inode number of owning user namespace is: %ld\n",
> (long) sb.st_ino);
>
> close(userns_fd);
> }
>
> /* Obtain a file descriptor for the parent namespace and
> then obtain and display the inode number of that namespace */
>
> if (argc > 2 && strchr(argv[2], 'p')) {
> parent_fd = ioctl(fd, NS_GET_PARENT);
>
> if (parent_fd == -1) {
> if (errno == EINVAL)
> printf("Can' get parent namespace of a "
> "nonhierarchical namespace\n");
> else if (errno == EPERM)
> printf("The parent namespace is outside "
> "your namespace scope\n");
> else
> perror("ioctl-NS_GET_PARENT");
> exit(EXIT_FAILURE);
> }
>
> if (fstat(parent_fd, &sb) == -1) {
> perror("fstat-parentns");
> exit(EXIT_FAILURE);
> }
> printf("Inode number of parent namespace is: %ld\n",
> (long) sb.st_ino);
>
> close(parent_fd);
> }
>
> exit(EXIT_SUCCESS);
> }
>
>
> --
> Michael Kerrisk
> Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
> Linux/UNIX System Programming Training: http://man7.org/training/