Kyle Moffett wrote:Here's a simple way to do what you want in userspace:
1) Apply the kernel bind mount options fix (*)
2) Run the following shell script
cat <<'EOF' >fsviews.bash
#! /bin/bash
# First make the subdirectories
mkdir /fsviews_orig
mount -t tmpfs tmpfs /fsviews_rw
mkdir /fsviews_orig/dir1
mkdir /fsviews_orig/dir2
mkdir /fsviews_orig/old
# Now make it read-only with a copy in /fsviews
mkdir /fsviews
mount --bind /fsviews_orig /fsviews
# Put directories in /fsviews
mount --bind /somewhere/dir1 /fsviews/dir1
mount --bind -o ro /otherplace/dir2 /fsviews/dir2
# Start the process in a new namespace
clone_prog bash <<'BACK_TO_OLD_NAMESPACE'
mount -o ro,remount /fsviews_orig
pivot_root /fsviews /fsviews/old
umount -l /fsviews/old
/dir1/myscript &
BACK_TO_OLD_NAMESPACE
# Remove the extra dirs in this namespace
umount -l /fsviews
umount -l /fsviews_orig
rmdir /fsviews
rmdir /fsviews_orig
EOF
This assumes that clone_prog is a short C program that does a clone()
syscall
with the CLONE_NEWNS flag and executes a new process.
Once this is done, "/dir2/script" is running in a _completely_ new
namespace
with a read-only root directory and two directories from other parts of
the vfs.
(*) IIRC currently bind-mount rw/ro options are those of the underlying
mount,
the bind-mount options fix provides a separate set of options for each
bound
copy. There is only one minimal security implication without said
patch, that
root can still 'mount -o rw,remount /' to get root writeable again, but
since it's
on tmpfs, that doesn't matter much. You could also just take away some
capabilities, but otherwise except for the shared process tables this
acts very
much like a completely new, separate computer. I've used this to
thoroughly
secure minimally trusted daemons before. :-D
Cheers,
Kyle Moffett
This provides minimal protection if any: the user may remount any block
devices on any given tree in his 'namespace' (in the sense of "that is
what we call a mount-table in Linux"). *
If I understand what Hans is looking to get done, he's asking for
someone to architect a system where any given process can be restricted
to seeing/accessing a subset of the namespace (in the sense of "a tree
of directories/files"). Eg: process Foo is allowed access to write to
/etc/group, but _not_ allowed access to /etc/shadow, under any
circumstances && Foo will be run as root. Hell, maybe Foo is never able
to even _see_ /etc/shadow (making it a true shadow file :).
Hans, correct me if I misunderstood.
[*] Somebody really should s/struct namespace/struct mounttable/g (or
even mounttree) on the kernel sources. 'Namespace' isn't very
descriptive and it leads to confusion :(