Re: [PATCH v2 6/6] shmem: add support to ignore swap

From: Christian Brauner
Date: Tue Apr 18 2023 - 03:38:24 EST


On Mon, Apr 17, 2023 at 10:50:59PM -0700, Hugh Dickins wrote:
> On Thu, 9 Mar 2023, Luis Chamberlain wrote:
>
> > In doing experimentations with shmem having the option to avoid swap
> > becomes a useful mechanism. One of the *raves* about brd over shmem is
> > you can avoid swap, but that's not really a good reason to use brd if
> > we can instead use shmem. Using brd has its own good reasons to exist,
> > but just because "tmpfs" doesn't let you do that is not a great reason
> > to avoid it if we can easily add support for it.
> >
> > I don't add support for reconfiguring incompatible options, but if
> > we really wanted to we can add support for that.
> >
> > To avoid swap we use mapping_set_unevictable() upon inode creation,
> > and put a WARN_ON_ONCE() stop-gap on writepages() for reclaim.
>
> I have one big question here, which betrays my ignorance:
> I hope that you or Christian can reassure me on this.
>
> tmpfs has fs_flags FS_USERNS_MOUNT. I know nothing about namespaces,
> nothing; but from overhearings, wonder if an ordinary user in a namespace
> might be able to mount their own tmpfs with "noswap", and thereby evade
> all accounting of the locked memory.
>
> That would be an absolute no-no for this patch; but I assume that even
> if so, it can be easily remedied by inserting an appropriate (unknown
> to me!) privilege check where the "noswap" option is validated.

Oh, good catch. Thanks! So you would just need sm like:

diff --git a/mm/shmem.c b/mm/shmem.c
index 787e83791eb5..21ce9b26bb4d 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -3571,6 +3571,10 @@ static int shmem_parse_one(struct fs_context *fc, struct fs_parameter *param)
ctx->seen |= SHMEM_SEEN_INUMS;
break;
case Opt_noswap:
+ if ((fc->user_ns != &init_user_ns) || !capable(CAP_SYS_ADMIN)) {
+ return invalfc(fc,
+ "Turning off swap in unprivileged tmpfs mounts unsupported");
+ }
ctx->noswap = true;
ctx->seen |= SHMEM_SEEN_NOSWAP;
break;

The fc->user_ns is the userns that the tmpfs mount will be mounted in, i.e.,
fc->user_ns will become sb->s_user_ns if FS_USERNS_MOUNT is raised. So with the
check above we require that the tmpfs instance must ultimately belong to the
initial userns and that the caller has CAP_SYS_ADMIN in the initial userns
(CAP_SYS_ADMIN guards swapon and swapoff) according to capabilities(7).