Re: RFC: Multiple instances of kernel namespaces.

From: Eric W. Biederman
Date: Sat Jan 21 2006 - 05:04:02 EST


"Serge E. Hallyn" <serue@xxxxxxxxxx> writes:

> Quoting Eric W. Biederman (ebiederm@xxxxxxxxxxxx):
>>
>> At this point I have to confess I have been working on something
>> similar, to IBM's pid virtualization work. But I have what is at
>> least for me a unifying concept, that makes things easier to think
>> about.
>>
>> The idea is to think about things in terms of namespaces. Currently
>> in the kernel we have the fs/mount namespace already implemented.
>>
>> Partly this helps on what the interface for creating a new namespace
>> instance should be. 'clone(CLONE_NEW<NAMESPACE_TYPE>)', and how
>> it should be managed from the kernel data structures.
>>
>> Partly thinking of things as namespaces helps me scope the problem.
>>
>> Does this sound like a sane approach?
>
> And a bonus of this is that for security and vserver-type applications,
> the CLONE_NEWPID and CLONE_NEWFS will often happen at the same time.
>
> How do you (or do you?) address naming namespaces? This would be
> necessary for transitioning into an existing namespace, performing
> actions on existing namespaces (i.e. checkpoint, migrate to another
> machine, enter the namespace and kill pid 521), and would just be
> useful for accounting purposes, i.e. how else do you have a
> "ps --all-namespaces" specify a process' namespace?

So I address naming indirectly. The last thing I want to have
is to add yet another namespace to the kernel for naming namespaces.
We have enough namespaces already.

In any sane context for a pid-namespace we need a pid that
we can call waitpid on, so we don't break the process tree.
Which means at least the init process has 2 pids, one
that it's parent sees, and another (1) that it and it's
children see.

So I name pidspaces like we do sessions of process groups
and sessions by the pid of the leader.

So in the simple case I have names like:
1178/1632

> Doubt we want to add an argument to clone(), so do we just add a new
> proc, sysfs, or syscall for setting a pid-namespace name?

That shouldn't be necessary.

> Do we need a new syscall for transitioning into an existing namespace?

That is a good question. The FS namespaces that we already have
has much the same problem. A completely different solution to
this problem seems to have been implemented but I don't grasp it
yet.

Inherently transitioning to an existing namespace is something
that is straight forward to implement, so it is worth thinking
about.

If I want a guest that can keep secrets from the host sysadmin I don't
want transitioning into a guest namespace to come too easily.

Currently I can always just create an extra child of pid 1
that I will be my slave. The problem is that this is an extra
process laying around.

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/