Re: [PATCH v3 0/7] File Sealing & memfd_create()

From: David Herrmann
Date: Tue Jul 08 2014 - 12:55:04 EST


Hi

On Fri, Jun 13, 2014 at 12:36 PM, David Herrmann <dh.herrmann@xxxxxxxxx> wrote:
> Hi
>
> This is v3 of the File-Sealing and memfd_create() patches. You can find v1 with
> a longer introduction at gmane:
> http://thread.gmane.org/gmane.comp.video.dri.devel/102241
> An LWN article about memfd+sealing is available, too:
> https://lwn.net/Articles/593918/
> v2 with some more discussions can be found here:
> http://thread.gmane.org/gmane.linux.kernel.mm/115713
>
> This series introduces two new APIs:
> memfd_create(): Think of this syscall as malloc() but it returns a
> file-descriptor instead of a pointer. That file-descriptor is
> backed by anon-memory and can be memory-mapped for access.
> sealing: The sealing API can be used to prevent a specific set of operations
> on a file-descriptor. You 'seal' the file and give thus the
> guarantee, that it cannot be modified in the specific ways.
>
> A short high-level introduction is also available here:
> http://dvdhrm.wordpress.com/2014/06/10/memfd_create2/
>
>
> Changed in v3:
> - fcntl() now returns EINVAL if the FD does not support sealing. We used to
> return EBADF like pipe_fcntl() does, but that is really weird and I don't
> like repeating that.
> - seals are now saved as "unsigned int" instead of "u32".
> - i_mmap_writable is now an atomic so we can deny writable mappings just like
> i_writecount does.
> - SHMEM_ALLOW_SEALING is dropped. We initialize all objects with F_SEAL_SEAL
> and only unset it for memfds that shall support sealing.
> - memfd_create() no longer has a size argument. It was redundant, use
> ftruncate() or fallocate().
> - memfd_create() flags are "unsigned int" now, instead of "u64".
> - NAME_MAX off-by-one fix
> - several cosmetic changes
> - Added AIO/Direct-IO page-pinning protection
>
> The last point is the most important change in this version: We now bail out if
> any page-refcount is elevated while setting SEAL_WRITE. This prevents parallel
> GUP users from writing to sealed files _after_ they were sealed. There is also a
> new FUSE-based test-case to trigger such situations.
>
> The last 2 patches try to improve the page-pinning handling. I included both in
> this series, but obviously only one of them is needed (or we could stack them):
> - 6/7: This waits for up to 150ms for pages to be unpinned
> - 7/7: This isolates pinned pages and replaces them with a fresh copy
>
> Hugh, patch 6 is basically your code. In case that gets merged, can I put your
> Signed-off-by on it?

Hugh, any comments on patch 5, 6 and 7? Those are the last outstanding
issues with memfd+sealing. Patch 7 (isolating pages) is still my
favorite and has been running just fine on my machine for the last
months. I think it'd be nice if we could give it a try in -next. We
can always fall back to Patch 5 or Patch 5+6. Those will detect any
racing AIO and just fail or wait for the IO to finish for a short
period.

Are there any other blockers for this?

Thanks
David
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/