[GIT PULL] mount api updates

From: Christian Brauner
Date: Thu Aug 24 2023 - 10:48:10 EST


Hey Linus,

/* Summary */
This introduces FSCONFIG_CMD_CREATE_EXCL which allows userspace to
implement something like mount -t ext4 --exclusive /dev/sda /B which
fails if a superblock for the requested filesystem does already exist
instead of silently reusing an existing superblock (see [4] for the
source of the move-mount binary):

(1) ./move-mount -f xfs -o source=/dev/sda4 /A
(2) ./move-mount -f xfs -o noacl,source=/dev/sda4 /B

The initial mounter (1) will create a superblock. The second mounter (2)
will reuse the existing superblock of (1), i.e., (2) creates a
bind-mount. The problem is that reusing an existing superblock means all
mount options other than read-only and read-write will be silently
ignored even if they are incompatible requests. For example, (2) has
requested no POSIX ACL support but since the existing superblock of (1)
is reused POSIX ACL support will remain enabled.

Such silent superblock reuse can easily become a security issue.

After adding support for FSCONFIG_CMD_CREATE_EXCL to mount(8)/util-linux
this can be fixed:

(1*) ./move-mount -f xfs --exclusive -o source=/dev/sda4 /A
(2*) ./move-mount -f xfs --exclusive -o noacl,source=/dev/sda4 /B
Device or resource busy | move-mount.c: 300: do_fsconfig: i xfs: reusing existing filesystem not allowed

Optional Details
================

As mentioned on the list (cf. [1]-[3]) regular mount requests of the
form mount -t ext4 /dev/sda /A are ambiguous. Userspace cannot be sure
whether this will simply create a bind-mount and therefore reuse an
existing superblock or create a new superblock:

P1 P2
fd_fs = fsopen("ext4"); fd_fs = fsopen("ext4");
fsconfig(fd_fs, FSCONFIG_SET_STRING, "source", "/dev/sda"); fsconfig(fd_fs, FSCONFIG_SET_STRING, "source", "/dev/sda");
fsconfig(fd_fs, FSCONFIG_SET_STRING, "dax", "always"); fsconfig(fd_fs, FSCONFIG_SET_STRING, "resuid", "1000");

// wins and creates superblock
fsconfig(fd_fs, FSCONFIG_CMD_CREATE, ...)
// finds compatible superblock of P1
// sleeps until P1 sets SB_BORN and grabs a reference
fsconfig(fd_fs, FSCONFIG_CMD_CREATE, ...)

fd_mnt1 = fsmount(fd_fs); fd_mnt2 = fsmount(fd_fs);
move_mount(fd_mnt1, "/A") move_mount(fd_mnt2, "/B")

Not just does P2 get a bind-mount but the mount options that P2
requests are silently ignored. The VFS itself doesn't, can't and
shouldn't enforce filesystem specific mount option compatibility. It
only enforces incompatibility for read-only <-> read-write transitions:

mount -t ext4 /dev/sda /A
mount -t ext4 -o ro /dev/sda /B

The read-only request will fail with EBUSY as the VFS can't just
silently transition a superblock from read-write to read-only or vica
versa without risking security issues.

The new FSCONFIG_CMD_CREATE_EXCL command for fsconfig() ensures that
EBUSY is returned if an existing superblock would be reused. Userspace
that needs to be sure that it did create a new superblock with the
requested mount options can request superblock creation using this
command.

This requires the new mount api. With the old mount api it would be
necessary to plumb this through every legacy filesystem's
file_system_type->mount() method. If they want this feature they are
most welcome to switch to the new mount api.

The commit adding the command has detailed explanations what this
command will mean for every single superblock allocation function and
filesystem we have. (Probably the oddest of the bunch is nfs as nfs
allocates one internal superblock per path component during mount - for
nfs4 at least. Superblock reuse here is lenient and frequent and an
implementation detail.)

Link: [1] https://lore.kernel.org/linux-block/20230704-fasching-wertarbeit-7c6ffb01c83d@brauner
Link: [2] https://lore.kernel.org/linux-block/20230705-pumpwerk-vielversprechend-a4b1fd947b65@brauner
Link: [3] https://lore.kernel.org/linux-fsdevel/20230725-einnahmen-warnschilder-17779aec0a97@brauner
Link: [4] https://github.com/brauner/move-mount-beneath

/* Testing */
clang: Ubuntu clang version 15.0.7
gcc: (Ubuntu 12.2.0-3ubuntu1) 12.2.0

All patches are based on v6.5-rc1 and have been sitting in linux-next.
No build failures or warnings were observed. All old and new tests in
selftests, and LTP pass without regressions.

/* Conflicts */
At the time of creating this PR no merge conflicts were reported from
linux-next and no merge conflicts showed up doing a test-merge with
current mainline.

The following changes since commit 06c2afb862f9da8dc5efa4b6076a0e48c3fbaaa5:

Linux 6.5-rc1 (2023-07-09 13:53:13 -0700)

are available in the Git repository at:

git@xxxxxxxxxxxxxxxxxxx:pub/scm/linux/kernel/git/vfs/vfs tags/v6.6-vfs.fs_context

for you to fetch changes up to 22ed7ecdaefe0cac0c6e6295e83048af60435b13:

fs: add FSCONFIG_CMD_CREATE_EXCL (2023-08-14 18:48:02 +0200)

Please consider pulling these changes from the signed v6.6-vfs.fs_context tag.

Thanks!
Christian

----------------------------------------------------------------
v6.6-vfs.fs_context

----------------------------------------------------------------
Christian Brauner (4):
super: remove get_tree_single_reconf()
fs: add vfs_cmd_create()
fs: add vfs_cmd_reconfigure()
fs: add FSCONFIG_CMD_CREATE_EXCL

fs/fs_context.c | 1 +
fs/fsopen.c | 106 ++++++++++++++++++++++++++++++---------------
fs/super.c | 64 +++++++++++++--------------
include/linux/fs_context.h | 4 +-
include/uapi/linux/mount.h | 3 +-
5 files changed, 107 insertions(+), 71 deletions(-)