Re: [RFC][PATCH] rbind across namespaces

From: Mike Waychison
Date: Tue May 24 2005 - 12:47:31 EST


Miklos Szeredi wrote:
In what sense? readlink of /proc/PID/fd/* will provide a pathname
relative to current's root: useless for any paths not in current's
namespace.


Not readlink, but actual dereference of link will give you exactly the
vfsmount/dentry the file was opened on. If you want to bind/move
whatever on that mount, that's possible, even if it's a "detached
tree".

Removing proc_check_root essentially removes any protection that namespaces provided in the first place.

Think of it like virtual memory. You can fork off and create your own COW instance, and no one can see what you are doing unless they ptrace you or explicitly ask you. The mountfd model allows for explicit handing off of vfsmounts between namespaces without allowing arbitrary access.


Note: I didn't say we should remove proc_check_root(). I said _if_
you remove it, _then_ mounting on foreign namespace will work, which
has actually been confirmed experimentally.


OK.

So your follow_link argument doesn't hold. The proc code does the
follow_link in some clever way, that the looked up object will end up
with the same vfsmount/dentry pair as the file.


Yes. I hadn't realized that the only safe-guard in place was proc_check_root. I had made the assumption that follow_link was using vfs_follow_link with the readlink info.


Yet even this doesn't allow userspace to define it's own policy for inter-namespace manipulation.


OK. Let's keep the kernel policy simple: just allow a process to
access it's _own_ file descriptors in proc. I.e. allow access to
/proc/self/fd/* even if it comes from a foreign namespace.

Then the policy (who passes whom the file descriptors) is entirely up
to userspace (just as with your scheme).

/proc/self/fd/FD doesn't give any extra rights to the process, it just
makes mounting from/to detached mounts, and foreign namespaces
possible without new interfaces.

So you'd say 'mount /dev/foo /proc/self/fd/4' if 4 was an fd pointing to a directory in another namespace?

That does require proc_check_root be removed. :\



Beware that due to the detached-subtrees bit, the locking became a bit ugly, requiring a global rw_lock for mntget/mntput. I still haven't figured out a better way to keep per-vfsmounts counts and per-subtree counts in sync.


Did you measure the effect on performace? Maybe it isn't so bad.

As far as I could tell without doing high-smp benchmarks, it doesn't slow anything down. In this case, we aren't on the fastest path anyway, as we are usually

a) walking a path across a mountpoint, requiring the global vfsmount_lock anyway.
b) closing a file, which isn't fast either.

You could micro-optimize the pivoting of from one vfsmount to another during a walk to only grab the lock once, but there may be no profound effect and any gains would be lost in the noise.

Mike Waychison
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/