Re: [RESEND PATCH v2 0/3] NFS User Namespaces with new mount API

From: Sargun Dhillon
Date: Sat Oct 17 2020 - 19:59:49 EST


On Fri, Oct 16, 2020 at 05:45:47AM -0700, Sargun Dhillon wrote:
> This patchset adds some functionality to allow NFS to be used from
> NFS namespaces (containers).
>
> Changes since v1:
> * Added samples
>
> Sargun Dhillon (3):
> NFS: Use cred from fscontext during fsmount
> samples/vfs: Split out common code for new syscall APIs
> samples/vfs: Add example leveraging NFS with new APIs and user
> namespaces
>
> fs/nfs/client.c | 2 +-
> fs/nfs/flexfilelayout/flexfilelayout.c | 1 +
> fs/nfs/nfs4client.c | 2 +-
> samples/vfs/.gitignore | 2 +
> samples/vfs/Makefile | 5 +-
> samples/vfs/test-fsmount.c | 86 +-----------
> samples/vfs/test-nfs-userns.c | 181 +++++++++++++++++++++++++
> samples/vfs/vfs-helper.c | 43 ++++++
> samples/vfs/vfs-helper.h | 55 ++++++++
> 9 files changed, 289 insertions(+), 88 deletions(-)
> create mode 100644 samples/vfs/test-nfs-userns.c
> create mode 100644 samples/vfs/vfs-helper.c
> create mode 100644 samples/vfs/vfs-helper.h
>
> --
> 2.25.1
>

Digging deeper into this a little bit, I actually found that there is some
problematic aspects of the current behaviour. Because nfs_get_tree_common calls
sget_fc, and sget_fc sets the super block's s_user_ns (via alloc_super) to the
fs_context's user namespace unless the global flag is set (which NFS does not
set), there are a bunch of permissions checks that are done against the super
block's user_ns.

It looks like this was introduced in:
f2aedb713c28: NFS: Add fs_context support[1]

It turns out that unmapped users in the "parent" user namespace just get an
EOVERFLOW error when trying to perform a read, even if the UID sent to the NFS
server to read a file is a valid uid (the uid in the init user ns), and
inode_permission checks permissions against the mapped UID in the namespace,
while the authentication credentials (UIDs, GIDs) sent to the server are
those from the init user ns.

[This is all under the assumption there's not upcalls doing ID mapping]

Although, I do not think this presents any security risk (because you have to
have CAP_SYS_ADMIN in the init user ns to get this far), it definitely seems
like "incorrect" behaviour.

[1]: https://lore.kernel.org/linux-nfs/20191120152750.6880-26-smayhew@xxxxxxxxxx/