Re: [PATCH V3 1/1] mm: add swapiness= arg to memory.reclaim

From: Yosry Ahmed
Date: Mon Dec 11 2023 - 14:42:10 EST


On Mon, Dec 11, 2023 at 6:04 AM Dan Schatzberg <schatzberg.dan@xxxxxxxxx> wrote:
>
> Allow proactive reclaimers to submit an additional swappiness=<val>
> argument to memory.reclaim. This overrides the global or per-memcg
> swappiness setting for that reclaim attempt.
>
> For example:
>
> echo "2M swappiness=0" > /sys/fs/cgroup/memory.reclaim
>
> will perform reclaim on the rootcg with a swappiness setting of 0 (no
> swap) regardless of the vm.swappiness sysctl setting.
>
> Signed-off-by: Dan Schatzberg <schatzberg.dan@xxxxxxxxx>
> ---
> Documentation/admin-guide/cgroup-v2.rst | 15 ++++++-
> include/linux/swap.h | 3 +-
> mm/memcontrol.c | 55 ++++++++++++++++++++-----
> mm/vmscan.c | 13 +++++-
> 4 files changed, 70 insertions(+), 16 deletions(-)
>
> diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
> index 3f85254f3cef..fc2b379dbd0f 100644
> --- a/Documentation/admin-guide/cgroup-v2.rst
> +++ b/Documentation/admin-guide/cgroup-v2.rst
> @@ -1282,8 +1282,8 @@ PAGE_SIZE multiple when read back.
> This is a simple interface to trigger memory reclaim in the
> target cgroup.
>
> - This file accepts a single key, the number of bytes to reclaim.
> - No nested keys are currently supported.
> + This file accepts a string which containers thhe number of bytes

contains* the*

I think this statement was only important because no keys were
supported, so I think we can remove it completely and rely on
documenting the supported keys below like other interfaces, see my
next comment.

> + to reclaim.
>
> Example::
>
> @@ -1304,6 +1304,17 @@ PAGE_SIZE multiple when read back.
> This means that the networking layer will not adapt based on
> reclaim induced by memory.reclaim.
>
> + This file also allows the user to specify the swappiness value
> + to be used for the reclaim. For example:
> +
> + echo "1G swappiness=60" > memory.reclaim
> +
> + The above instructs the kernel to perform the reclaim with
> + a swappiness value of 60. Note that this has the same semantics
> + as the vm.swappiness sysctl - it sets the relative IO cost of
> + reclaiming anon vs file memory but does not allow for reclaiming
> + specific amounts of anon or file memory.
> +

Can we instead follow the same format used by other nested-keyed files
(e.g. io.max)? This usually involves a table of supported keys and
such.

> memory.peak
> A read-only single value file which exists on non-root
> cgroups.
[..]
> @@ -6902,12 +6913,33 @@ static ssize_t memory_reclaim(struct kernfs_open_file *of, char *buf,
> unsigned int nr_retries = MAX_RECLAIM_RETRIES;
> unsigned long nr_to_reclaim, nr_reclaimed = 0;
> unsigned int reclaim_options;
> - int err;
> + char *old_buf, *start;
> + substring_t args[MAX_OPT_ARGS];
> + int swappiness = -1;
>
> buf = strstrip(buf);
> - err = page_counter_memparse(buf, "", &nr_to_reclaim);
> - if (err)
> - return err;
> +
> + old_buf = buf;
> + nr_to_reclaim = memparse(buf, &buf) / PAGE_SIZE;
> + if (buf == old_buf)
> + return -EINVAL;
> +
> + buf = strstrip(buf);
> +
> + while ((start = strsep(&buf, " ")) != NULL) {
> + if (!strlen(start))
> + continue;
> + switch (match_token(start, if_tokens, args)) {
> + case MEMORY_RECLAIM_SWAPPINESS:
> + if (match_int(&args[0], &swappiness))
> + return -EINVAL;
> + if (swappiness < 0 || swappiness > 200)

I am not a fan of extending the hardcoded 0 and 200 values, and now
the new -1 value. Maybe it's time to create constants for the min and
max swappiness values instead of hardcoding them everywhere? This can
be a separate preparatory patch. Then, -1 (or any invalid value) can
also be added as a constant with a useful name, instead of passing -1
to all other callers.

This should make the code a little bit more readable and easier to extend.