[RFC PATCH 00/20] Introduce the famfs shared-memory file system

From: John Groves
Date: Fri Feb 23 2024 - 12:42:31 EST


This patch set introduces famfs[1] - a special-purpose fs-dax file system
for sharable disaggregated or fabric-attached memory (FAM). Famfs is not
CXL-specific in anyway way.

* Famfs creates a simple access method for storing and sharing data in
sharable memory. The memory is exposed and accessed as memory-mappable
dax files.
* Famfs supports multiple hosts mounting the same file system from the
same memory (something existing fs-dax file systems don't do).
* A famfs file system can be created on either a /dev/pmem device in fs-dax
mode, or a /dev/dax device in devdax mode (the latter depending on
patches 2-6 of this series).

The famfs kernel file system is part the famfs framework; additional
components in user space[2] handle metadata and direct the famfs kernel
module to instantiate files that map to specific memory. The famfs user
space has documentation and a reasonably thorough test suite.

The famfs kernel module never accesses the shared memory directly (either
data or metadata). Because of this, shared memory managed by the famfs
framework does not create a RAS "blast radius" problem that should be able
to crash or de-stabilize the kernel. Poison or timeouts in famfs memory
can be expected to kill apps via SIGBUS and cause mounts to be disabled
due to memory failure notifications.

Famfs does not attempt to solve concurrency or coherency problems for apps,
although it does solve these problems in regard to its own data structures.
Apps may encounter hard concurrency problems, but there are use cases that
are imminently useful and uncomplicated from a concurrency perspective:
serial sharing is one (only one host at a time has access), and read-only
concurrent sharing is another (all hosts can read-cache without worry).

Contents:

* famfs kernel documentation [patch 1]. Note that evolving famfs user
documentation is at [2]
* dev_dax_iomap patchset [patches 2-6] - This enables fs-dax to use the
iomap interface via a character /dev/dax device (e.g. /dev/dax0.0). For
historical reasons the iomap infrastructure was enabled only for
/dev/pmem devices (which are dax block devices). As famfs is the first
fs-dax file system that works on /dev/dax, this patch series fills in
the bare minimum infrastructure to enable iomap api usage with /dev/dax.
* famfs patchset [patches 7-20] - this introduces the kernel component of
famfs.

IMPORTANT NOTE: There is a developing consensus that /dev/dax requires
some fundamental re-factoring (e.g. [3]) that is related but outside the
scope of this series.

Some observations about using sharable memory

* It does not make sense to online sharable memory as system-ram.
System-ram gets zeroed when it is onlined, so sharing is basically
nonsense.
* It does not make sense to put struct page's in sharable memory, because
those can't be shared. However, separately providing non-sharable
capacity to be used for struct page's might be a sensible approach if the
size of struct page array for sharable memory is too large to put in
conventional system-ram (albeit with possible RAS implications).
* Sharable memory is pmem-like, in that a host is likely to connect in
order to gain access to data that is already in the memory. Moreover
the power domain for shared memory is separate for that of the server.
Having observed that, famfs is not intended for persistent storage. It is
intended for sharing data sets in memory during a time frame where the
memory and the compute nodes are expected to remain operational - such
as during a clustered data analytics job.

Could we do this with FUSE?

The key performance requirement for famfs is efficient handling of VMA
faults. This requires caching the complete dax extent lists for all active
files so faults can be handled without upcalls, which FUSE does not do.
It would probably be possible to put this capability FUSE, but we think
that keeping famfs separate from FUSE is the simpler approach.

This patch set is available as a branch at [5]

References

[1] https://lpc.events/event/17/contributions/1455/
[2] https://github.com/cxl-micron-reskit/famfs
[3] https://lore.kernel.org/all/166630293549.1017198.3833687373550679565.stgit@xxxxxxxxxxxxxxxxxxxxxxxxx/
[4] https://www.computeexpresslink.org/download-the-specification
[5] https://github.com/cxl-micron-reskit/famfs-linux

John Groves (20):
famfs: Documentation
dev_dax_iomap: Add fs_dax_get() func to prepare dax for fs-dax usage
dev_dax_iomap: Move dax_pgoff_to_phys from device.c to bus.c since
both need it now
dev_dax_iomap: Save the kva from memremap
dev_dax_iomap: Add dax_operations for use by fs-dax on devdax
dev_dax_iomap: Add CONFIG_DEV_DAX_IOMAP kernel build parameter
famfs: Add include/linux/famfs_ioctl.h
famfs: Add famfs_internal.h
famfs: Add super_operations
famfs: famfs_open_device() & dax_holder_operations
famfs: Add fs_context_operations
famfs: Add inode_operations and file_system_type
famfs: Add iomap_ops
famfs: Add struct file_operations
famfs: Add ioctl to file_operations
famfs: Add fault counters
famfs: Add module stuff
famfs: Support character dax via the dev_dax_iomap patch
famfs: Update MAINTAINERS file
famfs: Add Kconfig and Makefile plumbing

Documentation/filesystems/famfs.rst | 124 +++++
MAINTAINERS | 11 +
drivers/dax/Kconfig | 6 +
drivers/dax/bus.c | 131 ++++++
drivers/dax/dax-private.h | 1 +
drivers/dax/device.c | 38 +-
drivers/dax/super.c | 38 ++
fs/Kconfig | 2 +
fs/Makefile | 1 +
fs/famfs/Kconfig | 10 +
fs/famfs/Makefile | 5 +
fs/famfs/famfs_file.c | 704 ++++++++++++++++++++++++++++
fs/famfs/famfs_inode.c | 586 +++++++++++++++++++++++
fs/famfs/famfs_internal.h | 126 +++++
include/linux/dax.h | 5 +
include/uapi/linux/famfs_ioctl.h | 56 +++
16 files changed, 1821 insertions(+), 23 deletions(-)
create mode 100644 Documentation/filesystems/famfs.rst
create mode 100644 fs/famfs/Kconfig
create mode 100644 fs/famfs/Makefile
create mode 100644 fs/famfs/famfs_file.c
create mode 100644 fs/famfs/famfs_inode.c
create mode 100644 fs/famfs/famfs_internal.h
create mode 100644 include/uapi/linux/famfs_ioctl.h


base-commit: 841c35169323cd833294798e58b9bf63fa4fa1de
--
2.43.0