Re: [PATCH v4 00/39] netfs, afs, 9p: Delegate high-level I/O to netfslib

From: Jeff Layton
Date: Thu Dec 14 2023 - 09:12:02 EST


On Wed, 2023-12-13 at 15:23 +0000, David Howells wrote:
> Hi Jeff, Steve, Dominique,
>
> I have been working on my netfslib helpers to the point that I can run
> xfstests on AFS to completion (both with write-back buffering and, with a
> small patch, write-through buffering in the pagecache). I have a patch for
> 9P, but am currently unable to test it.
>
> The patches remove a little over 800 lines from AFS, 300 from 9P, albeit with
> around 3000 lines added to netfs. Hopefully, I will be able to remove a bunch
> of lines from Ceph too.
>
> I've split the CIFS patches out to a separate branch, cifs-netfs, where a
> further 2000+ lines are removed. I can run a certain amount of xfstests on
> CIFS, though I'm running into ksmbd issues and not all the tests work
> correctly because of issues between fallocate and what the SMB protocol
> actually supports.
>
> I've also dropped the content-crypto patches out for the moment as they're
> only usable by the ceph changes which I'm still working on.
>
> The patch to use PG_writeback instead of PG_fscache for writing to the
> cache has also been deferred, pending 9p, afs, ceph and cifs all being
> converted.
>
> The main aims of these patches are to get high-level I/O and knowledge of
> the pagecache out of the filesystem drivers as much as possible and to get
> rid, as much of possible, of the knowledge that pages/folios exist.
>
> Further, I would like to see ->write_begin, ->write_end and ->launder_folio
> go away.
>
> Features that are added by these patches to that which is already there in
> netfslib:
>
> (1) NFS-style (and Ceph-style) locking around DIO vs buffered I/O calls to
> prevent these from happening at the same time. mmap'd I/O can, of
> necessity, happen at any time ignoring these locks.
>
> (2) Support for unbuffered I/O. The data is kept in the bounce buffer and
> the pagecache is not used. This can be turned on with an inode flag.
>
> (3) Support for direct I/O. This is basically unbuffered I/O with some
> extra restrictions and no RMW.
>
> (4) Support for using a bounce buffer in an operation. The bounce buffer
> may be bigger than the target data/buffer, allowing for crypto
> rounding.
>
> (5) ->write_begin() and ->write_end() are ignored in favour of merging all
> of that into one function, netfs_perform_write(), thereby avoiding the
> function pointer traversals.
>
> (6) Support for write-through caching in the pagecache.
> netfs_perform_write() adds the pages is modifies to an I/O operation
> as it goes and directly marks them writeback rather than dirty. When
> writing back from write-through, it limits the range written back.
> This should allow CIFS to deal with byte-range mandatory locks
> correctly.
>
> (7) O_*SYNC and RWF_*SYNC writes use write-through rather than writing to
> the pagecache and then flushing afterwards. An AIO O_*SYNC write will
> notify of completion when the sub-writes all complete.
>
> (8) Support for write-streaming where modifed data is held in !uptodate
> folios, with a private struct attached indicating the range that is
> valid.
>
> (9) Support for write grouping, multiplexing a pointer to a group in the
> folio private data with the write-streaming data. The writepages
> algorithm only writes stuff back that's in the nominated group. This
> is intended for use by Ceph to write is snaps in order.
>
> (10) Skipping reads for which we know the server could only supply zeros or
> EOF (for instance if we've done a local write that leaves a hole in
> the file and extends the local inode size).
>
> General notes:
>
> (1) The fscache module is merged into the netfslib module to avoid cyclic
> exported symbol usage that prevents either module from being loaded.
>
> (2) Some helpers from fscache are reassigned to netfslib by name.
>
> (3) netfslib now makes use of folio->private, which means the filesystem
> can't use it.
>
> (4) The filesystem provides wrappers to call the write helpers, allowing
> it to do pre-validation, oplock/capability fetching and the passing in
> of write group info.
>
> (5) I want to try flushing the data when tearing down an inode before
> invalidating it to try and render launder_folio unnecessary.
>
> (6) Write-through caching will generate and dispatch write subrequests as
> it gathers enough data to hit wsize and has whole pages that at least
> span that size. This needs to be a bit more flexible, allowing for a
> filesystem such as CIFS to have a variable wsize.
>
> (7) The filesystem driver is just given read and write calls with an
> iov_iter describing the data/buffer to use. Ideally, they don't see
> pages or folios at all. A function, extract_iter_to_sg(), is already
> available to decant part of an iterator into a scatterlist for crypto
> purposes.
>
>
> 9P notes:
>
> (1) I haven't managed to test this as I haven't been able to get Ganesha
> to work correctly with 9P.
>
> (2) Writes should now occur in larger-than-page-sized chunks.
>
> (3) It should be possible to turn on multipage folio support in 9P now.
>
>
> Changes
> =======
> ver #4)
> - Slimmed down the branch:
> - Split the cifs-related patches off to a separate branch (cifs-netfs)
> - Deferred the content-encryption to the in-progress ceph changes.
> - Deferred the use-PG_writeback rather than PG_fscache patch
> - Rebased on a later linux-next with afs-rotation patches.
>
> ver #3)
> - Moved the fscache module into netfslib to avoid export cycles.
> - Fixed a bunch of bugs.
> - Got CIFS to pass as much of xfstests as possible.
> - Added a patch to make 9P use all the helpers.
> - Added a patch to stop using PG_fscache, but rather dirty pages on
> reading and have writepages write to the cache.
>
> ver #2)
> - Folded the addition of NETFS_RREQ_NONBLOCK/BLOCKED into first patch that
> uses them.
> - Folded addition of rsize member into first user.
> - Don't set rsize in ceph (yet) and set it in kafs to 256KiB. cifs sets
> it dynamically.
> - Moved direct_bv next to direct_bv_count in struct netfs_io_request and
> labelled it with a __counted_by().
> - Passed flags into netfs_xa_store_and_mark() rather than two bools.
> - Removed netfs_set_up_buffer() as it wasn't used.
>
> David
>
> Link: https://lore.kernel.org/r/20231013160423.2218093-1-dhowells@xxxxxxxxxx/ # v1
> Link: https://lore.kernel.org/r/20231117211544.1740466-1-dhowells@xxxxxxxxxx/ # v2
>
> David Howells (39):
> netfs, fscache: Move fs/fscache/* into fs/netfs/
> netfs, fscache: Combine fscache with netfs
> netfs, fscache: Remove ->begin_cache_operation
> netfs, fscache: Move /proc/fs/fscache to /proc/fs/netfs and put in a
> symlink
> netfs: Move pinning-for-writeback from fscache to netfs
> netfs: Add a procfile to list in-progress requests
> netfs: Allow the netfs to make the io (sub)request alloc larger
> netfs: Add a ->free_subrequest() op
> afs: Don't use folio->private to record partial modification
> netfs: Provide invalidate_folio and release_folio calls
> netfs: Implement unbuffered/DIO vs buffered I/O locking
> netfs: Add iov_iters to (sub)requests to describe various buffers
> netfs: Add support for DIO buffering
> netfs: Provide tools to create a buffer in an xarray
> netfs: Add bounce buffering support
> netfs: Add func to calculate pagecount/size-limited span of an
> iterator
> netfs: Limit subrequest by size or number of segments
> netfs: Export netfs_put_subrequest() and some tracepoints
> netfs: Extend the netfs_io_*request structs to handle writes
> netfs: Add a hook to allow tell the netfs to update its i_size
> netfs: Make netfs_put_request() handle a NULL pointer
> netfs: Make the refcounting of netfs_begin_read() easier to use
> netfs: Prep to use folio->private for write grouping and streaming
> write
> netfs: Dispatch write requests to process a writeback slice
> netfs: Provide func to copy data to pagecache for buffered write
> netfs: Make netfs_read_folio() handle streaming-write pages
> netfs: Allocate multipage folios in the writepath
> netfs: Implement support for unbuffered/DIO read
> netfs: Implement unbuffered/DIO write support
> netfs: Implement buffered write API
> netfs: Allow buffered shared-writeable mmap through
> netfs_page_mkwrite()
> netfs: Provide netfs_file_read_iter()
> netfs, cachefiles: Pass upper bound length to allow expansion
> netfs: Provide a writepages implementation
> netfs: Provide a launder_folio implementation
> netfs: Implement a write-through caching option
> netfs: Optimise away reads above the point at which there can be no
> data
> afs: Use the netfs write helpers
> 9p: Use netfslib read/write_iter
>
> Documentation/filesystems/netfs_library.rst | 23 +-
> MAINTAINERS | 2 +-
> fs/9p/vfs_addr.c | 352 +----
> fs/9p/vfs_file.c | 89 +-
> fs/9p/vfs_inode.c | 5 +-
> fs/9p/vfs_super.c | 14 +-
> fs/Kconfig | 1 -
> fs/Makefile | 1 -
> fs/afs/file.c | 213 +--
> fs/afs/inode.c | 26 +-
> fs/afs/internal.h | 72 +-
> fs/afs/super.c | 2 +-
> fs/afs/write.c | 826 +----------
> fs/cachefiles/internal.h | 2 +-
> fs/cachefiles/io.c | 10 +-
> fs/cachefiles/ondemand.c | 2 +-
> fs/ceph/addr.c | 25 +-
> fs/ceph/cache.h | 35 +-
> fs/ceph/inode.c | 2 +-
> fs/fs-writeback.c | 10 +-
> fs/fscache/Kconfig | 40 -
> fs/fscache/Makefile | 16 -
> fs/fscache/internal.h | 277 ----
> fs/netfs/Kconfig | 39 +
> fs/netfs/Makefile | 22 +-
> fs/netfs/buffered_read.c | 229 ++-
> fs/netfs/buffered_write.c | 1247 +++++++++++++++++
> fs/netfs/direct_read.c | 252 ++++
> fs/netfs/direct_write.c | 170 +++
> fs/{fscache/cache.c => netfs/fscache_cache.c} | 0
> .../cookie.c => netfs/fscache_cookie.c} | 0
> fs/netfs/fscache_internal.h | 14 +
> fs/{fscache/io.c => netfs/fscache_io.c} | 42 +-
> fs/{fscache/main.c => netfs/fscache_main.c} | 25 +-
> fs/{fscache/proc.c => netfs/fscache_proc.c} | 23 +-
> fs/{fscache/stats.c => netfs/fscache_stats.c} | 4 +-
> .../volume.c => netfs/fscache_volume.c} | 0
> fs/netfs/internal.h | 288 ++++
> fs/netfs/io.c | 214 ++-
> fs/netfs/iterator.c | 97 ++
> fs/netfs/locking.c | 215 +++
> fs/netfs/main.c | 110 ++
> fs/netfs/misc.c | 260 ++++
> fs/netfs/objects.c | 63 +-
> fs/netfs/output.c | 478 +++++++
> fs/netfs/stats.c | 31 +-
> fs/nfs/Kconfig | 4 +-
> fs/nfs/fscache.c | 7 -
> fs/smb/client/cifsfs.c | 9 +-
> fs/smb/client/file.c | 18 +-
> fs/smb/client/fscache.c | 2 +-
> include/linux/fs.h | 2 +-
> include/linux/fscache.h | 45 -
> include/linux/netfs.h | 176 ++-
> include/linux/writeback.h | 2 +-
> include/trace/events/afs.h | 31 -
> include/trace/events/netfs.h | 155 +-
> mm/filemap.c | 1 +
> 58 files changed, 4197 insertions(+), 2123 deletions(-)
> delete mode 100644 fs/fscache/Kconfig
> delete mode 100644 fs/fscache/Makefile
> delete mode 100644 fs/fscache/internal.h
> create mode 100644 fs/netfs/buffered_write.c
> create mode 100644 fs/netfs/direct_read.c
> create mode 100644 fs/netfs/direct_write.c
> rename fs/{fscache/cache.c => netfs/fscache_cache.c} (100%)
> rename fs/{fscache/cookie.c => netfs/fscache_cookie.c} (100%)
> create mode 100644 fs/netfs/fscache_internal.h
> rename fs/{fscache/io.c => netfs/fscache_io.c} (86%)
> rename fs/{fscache/main.c => netfs/fscache_main.c} (84%)
> rename fs/{fscache/proc.c => netfs/fscache_proc.c} (58%)
> rename fs/{fscache/stats.c => netfs/fscache_stats.c} (97%)
> rename fs/{fscache/volume.c => netfs/fscache_volume.c} (100%)
> create mode 100644 fs/netfs/locking.c
> create mode 100644 fs/netfs/misc.c
> create mode 100644 fs/netfs/output.c
>

This all looks pretty great, David. Nice work! I had a few comments on a
few of them, but most are no big deal. It'd be nice to get this into
linux-next soon.

On the ones where I didn't have comments, you can add:

Reviewed-by: Jeff Layton <jlayton@xxxxxxxxxx>