Re: [RFC PATCH 00/20] Introduce the famfs shared-memory file system

From: Luis Chamberlain
Date: Fri Feb 23 2024 - 19:07:20 EST


On Fri, Feb 23, 2024 at 11:41:44AM -0600, John Groves wrote:
> This patch set introduces famfs[1] - a special-purpose fs-dax file system
> for sharable disaggregated or fabric-attached memory (FAM). Famfs is not
> CXL-specific in anyway way.
>
> * Famfs creates a simple access method for storing and sharing data in
> sharable memory. The memory is exposed and accessed as memory-mappable
> dax files.
> * Famfs supports multiple hosts mounting the same file system from the
> same memory (something existing fs-dax file systems don't do).
> * A famfs file system can be created on either a /dev/pmem device in fs-dax
> mode, or a /dev/dax device in devdax mode (the latter depending on
> patches 2-6 of this series).
>
> The famfs kernel file system is part the famfs framework; additional
> components in user space[2] handle metadata and direct the famfs kernel
> module to instantiate files that map to specific memory. The famfs user
> space has documentation and a reasonably thorough test suite.
>
> The famfs kernel module never accesses the shared memory directly (either
> data or metadata). Because of this, shared memory managed by the famfs
> framework does not create a RAS "blast radius" problem that should be able
> to crash or de-stabilize the kernel. Poison or timeouts in famfs memory
> can be expected to kill apps via SIGBUS and cause mounts to be disabled
> due to memory failure notifications.
>
> Famfs does not attempt to solve concurrency or coherency problems for apps,
> although it does solve these problems in regard to its own data structures.
> Apps may encounter hard concurrency problems, but there are use cases that
> are imminently useful and uncomplicated from a concurrency perspective:
> serial sharing is one (only one host at a time has access), and read-only
> concurrent sharing is another (all hosts can read-cache without worry).

Can you do me a favor, curious if you can run a test like this:

fio -name=ten-1g-per-thread --nrfiles=10 -bs=2M -ioengine=io_uring
-direct=1
--group_reporting=1 --alloc-size=1048576 --filesize=1GiB
--readwrite=write --fallocate=none --numjobs=$(nproc) --create_on_open=1
--directory=/mnt

What do you get for throughput?

The absolute large the system an capacity the better.

Luis