Re: [PATCH v2 5/6] shmem: update documentation

From: Hugh Dickins
Date: Tue Apr 18 2023 - 01:30:38 EST


On Thu, 9 Mar 2023, Luis Chamberlain wrote:

> Update the docs to reflect a bit better why some folks prefer tmpfs
> over ramfs and clarify a bit more about the difference between brd
> ramdisks.
>
> While at it, add THP docs for tmpfs, both the mount options and the
> sysfs file.

Okay: the original canonical reference for THP options on tmpfs has
been Documentation/admin-guide/mm/transhuge.rst. You're right that
they would be helpful here too: IIRC (but I might well be confusing
with our Google tree) we used to have them documented in both places,
but grew tired of keeping the two in synch. You're volunteering to
do so! so please check now that they tell the same story.

But nowadays, "man 5 tmpfs" is much more important (and that might
give you a hint for what needs to be done after this series goes into
6.4-rc - and I wonder if there are tmpfs manpage updates needed from
Christian for idmapped too? or already taken care of?).

There's a little detail we do need you to remove, indicated below.

>
> Reviewed-by: Christian Brauner <brauner@xxxxxxxxxx>
> Reviewed-by: David Hildenbrand <david@xxxxxxxxxx>
> Signed-off-by: Luis Chamberlain <mcgrof@xxxxxxxxxx>
> ---
> Documentation/filesystems/tmpfs.rst | 57 +++++++++++++++++++++++++----
> 1 file changed, 49 insertions(+), 8 deletions(-)
>
> diff --git a/Documentation/filesystems/tmpfs.rst b/Documentation/filesystems/tmpfs.rst
> index 0408c245785e..1ec9a9f8196b 100644
> --- a/Documentation/filesystems/tmpfs.rst
> +++ b/Documentation/filesystems/tmpfs.rst
> @@ -13,14 +13,25 @@ everything stored therein is lost.
>
> tmpfs puts everything into the kernel internal caches and grows and
> shrinks to accommodate the files it contains and is able to swap
> -unneeded pages out to swap space. It has maximum size limits which can
> -be adjusted on the fly via 'mount -o remount ...'
> -
> -If you compare it to ramfs (which was the template to create tmpfs)
> -you gain swapping and limit checking. Another similar thing is the RAM
> -disk (/dev/ram*), which simulates a fixed size hard disk in physical
> -RAM, where you have to create an ordinary filesystem on top. Ramdisks
> -cannot swap and you do not have the possibility to resize them.
> +unneeded pages out to swap space, and supports THP.
> +
> +tmpfs extends ramfs with a few userspace configurable options listed and
> +explained further below, some of which can be reconfigured dynamically on the
> +fly using a remount ('mount -o remount ...') of the filesystem. A tmpfs
> +filesystem can be resized but it cannot be resized to a size below its current
> +usage. tmpfs also supports POSIX ACLs, and extended attributes for the
> +trusted.* and security.* namespaces. ramfs does not use swap and you cannot
> +modify any parameter for a ramfs filesystem. The size limit of a ramfs
> +filesystem is how much memory you have available, and so care must be taken if
> +used so to not run out of memory.
> +
> +An alternative to tmpfs and ramfs is to use brd to create RAM disks
> +(/dev/ram*), which allows you to simulate a block device disk in physical RAM.
> +To write data you would just then need to create an regular filesystem on top
> +this ramdisk. As with ramfs, brd ramdisks cannot swap. brd ramdisks are also
> +configured in size at initialization and you cannot dynamically resize them.
> +Contrary to brd ramdisks, tmpfs has its own filesystem, it does not rely on the
> +block layer at all.
>
> Since tmpfs lives completely in the page cache and on swap, all tmpfs
> pages will be shown as "Shmem" in /proc/meminfo and "Shared" in
> @@ -85,6 +96,36 @@ mount with such options, since it allows any user with write access to
> use up all the memory on the machine; but enhances the scalability of
> that instance in a system with many CPUs making intensive use of it.
>
> +tmpfs also supports Transparent Huge Pages which requires a kernel
> +configured with CONFIG_TRANSPARENT_HUGEPAGE and with huge supported for
> +your system (has_transparent_hugepage(), which is architecture specific).
> +The mount options for this are:
> +
> +====== ============================================================
> +huge=0 never: disables huge pages for the mount
> +huge=1 always: enables huge pages for the mount
> +huge=2 within_size: only allocate huge pages if the page will be
> + fully within i_size, also respect fadvise()/madvise() hints.
> +huge=3 advise: only allocate huge pages if requested with
> + fadvise()/madvise()

You're taking the source too literally there. Minor point is that there
is no fadvise() for this, to date anyway. Major point is: have you tried
mounting tmpfs with huge=0 etc? I did propose "huge=0" and "huge=1" years
ago, but those "never" went in, it's "always" been the named options.
Please remove those misleading numbers, it's "huge=never" etc.

(Old Google internal trees excepted: and trying to wean people off
"huge=1" internally makes me a bit touchy when seeing those numbers above!)

> +====== ============================================================
> +
> +There is a sysfs file which you can also use to control system wide THP
> +configuration for all tmpfs mounts, the file is:
> +
> +/sys/kernel/mm/transparent_hugepage/shmem_enabled
> +
> +This sysfs file is placed on top of THP sysfs directory and so is registered
> +by THP code. It is however only used to control all tmpfs mounts with one
> +single knob. Since it controls all tmpfs mounts it should only be used either
> +for emergency or testing purposes. The values you can set for shmem_enabled are:
> +
> +== ============================================================
> +-1 deny: disables huge on shm_mnt and all mounts, for
> + emergency use
> +-2 force: enables huge on shm_mnt and all mounts, w/o needing
> + option, for testing

Likewise here, please delete the invalid "-1" and "-2" notations,
-1 and -2 are just #defines for use in the kernel source.

And the description above is not quite accurate: it is very hard to
describe shmem_enabled, partly because it combines two different things.
It's partly the "huge=" mount option for any "internal mount", those
things like SysV SHM and memfd and i915 and shared-anonymous: the shmem
which has no user-visible mount to hold the option. But also these
"deny" and "force" overrides affecting *all* internal and visible mounts.

Hugh

> +== ============================================================
>
> tmpfs has a mount option to set the NUMA memory allocation policy for
> all files in that instance (if CONFIG_NUMA is enabled) - which can be
> --
> 2.39.1