Re: [PATCH -mm 5/7] add user namespace

From: Kyle Moffett
Date: Sat Jul 15 2006 - 09:29:12 EST


On Jul 15, 2006, at 08:35:18, Eric W. Biederman wrote:
Kyle Moffett <mrmacman_g4@xxxxxxx> writes:
With NFS and the proposed superblock-sharing patches (necessary for efficiency and other reasons I don't entirely understand), the situation is worse: A mount of server:/foo/bar on / in the bar virtual machine may get its superblock merged with a mount of server:/ foo/baz on / in the baz virtual machine. If it's efficient to merge those superblocks we should, and once again it's necessary to tie the UID namespace to the vfsmount, not the superblock.

I completely agree that pushing nameidata down into generic_permission where we can use per mount properties in our permission checks is ideal. The benefit I see is just a small increase in flexibility. So I don't really care either way.

Currently there are several additional flags that could benefit from a per vfsmount interpretation as well: nosuid, noexec, nodev, and readonly, how do we handle those?

noexec is on the vfsmount.
nosuid is on the vfsmount
nodev is on the vfsmount
readonly is not on the vfsmount.

The existing precedent is clearly in favor of putting this kind of information on the vfsmount. The read-only attribute seems to be the only hold out. If readonly has deep implications like no journal replay it makes sense to keep it per mount. Which indicates we could nose a nowrite option to express the per vfsmount property.

Well, speaking of that; there's been another thread recently that's splitting the properties of read-only between vfsmount and superblock. So a read-only superblock implies read-only vfsmounts, but the following can create a read-only vfsmount for a writable superblock:

mount --bind / /mnt/read-only-root
mount -o ro,remount /mnt/read-only-root

So the readonly special case will go away.

I hope the confusion has passed for Trond. My impression was he figured this was per process data so it didn't make sense any where near a filesystem, and the superblock was the last place it should be.

One of the things I said earlier in this thread is that "Both filesystems _and_ processes should be first-class objects in any UID namespace". In order to have sufficient access controls in the presence of _only_ a UID-namespace (as opposed to with full container isolation), you need to check against an object *and* the namespace in which it is located. In some cases, the object is a file, which means that the inode, vfsmount, or superblock need a UID namespace reference. Theoretically a you could implement per-file UID namespace pointers, but that would probably be incredibly inefficient. IMHO, per-vfsmount gives the best flexibility and efficiency of the three.

In fact, it's strange to think about this in context with the rest of the namespaces that are being designed, but processes would ordinarily *not* have primary presence in a UID namespace if they weren't the target of UID-verified operations in and of themselves (EX: kill, ptrace, etc). Otherwise they would just have a set of (namespace,UID,cap_flags) pairs to give them access to filesystems in specific uid namespaces.

Cheers,
Kyle Moffett

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/