Re: [PATCH v7 01/14] mm: Add F_SEAL_AUTO_ALLOCATE seal to memfd

From: Chao Peng
Date: Mon Jul 25 2022 - 09:59:22 EST


On Thu, Jul 21, 2022 at 12:27:03PM +0200, Gupta, Pankaj wrote:
>
> > > Normally, a write to unallocated space of a file or the hole of a sparse
> > > file automatically causes space allocation, for memfd, this equals to
> > > memory allocation. This new seal prevents such automatically allocating,
> > > either this is from a direct write() or a write on the previously
> > > mmap-ed area. The seal does not prevent fallocate() so an explicit
> > > fallocate() can still cause allocating and can be used to reserve
> > > memory.
> > >
> > > This is used to prevent unintentional allocation from userspace on a
> > > stray or careless write and any intentional allocation should use an
> > > explicit fallocate(). One of the main usecases is to avoid memory double
> > > allocation for confidential computing usage where we use two memfds to
> > > back guest memory and at a single point only one memfd is alive and we
> > > want to prevent memory allocation for the other memfd which may have
> > > been mmap-ed previously. More discussion can be found at:
> > >
> > > https://lkml.org/lkml/2022/6/14/1255
> > >
> > > Suggested-by: Sean Christopherson <seanjc@xxxxxxxxxx>
> > > Signed-off-by: Chao Peng <chao.p.peng@xxxxxxxxxxxxxxx>
> > > ---
> > > include/uapi/linux/fcntl.h | 1 +
> > > mm/memfd.c | 3 ++-
> > > mm/shmem.c | 16 ++++++++++++++--
> > > 3 files changed, 17 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/include/uapi/linux/fcntl.h b/include/uapi/linux/fcntl.h
> > > index 2f86b2ad6d7e..98bdabc8e309 100644
> > > --- a/include/uapi/linux/fcntl.h
> > > +++ b/include/uapi/linux/fcntl.h
> > > @@ -43,6 +43,7 @@
> > > #define F_SEAL_GROW 0x0004 /* prevent file from growing */
> > > #define F_SEAL_WRITE 0x0008 /* prevent writes */
> > > #define F_SEAL_FUTURE_WRITE 0x0010 /* prevent future writes while mapped */
> > > +#define F_SEAL_AUTO_ALLOCATE 0x0020 /* prevent allocation for writes */
> >
> > Why only "on writes" and not "on reads". IIRC, shmem doesn't support the
> > shared zeropage, so you'll simply allocate a new page via read() or on
> > read faults.
> >
> >
> > Also, I *think* you can place pages via userfaultfd into shmem. Not sure
> > if that would count "auto alloc", but it would certainly bypass fallocate().
>
> I was also thinking this at the same time, but for different reason:
>
> "Want to populate private preboot memory with firmware payload", so was
> thinking userfaulftd could be an option as direct writes are restricted?

If that can be a side effect, I definitely glad to see it, though I'm
still not clear how userfaultfd can be particularly helpful for that.

Chao
>
> Thanks,
> Pankaj
>
>
>
>