[RFC PATCH v4 3/3] Documentation: sysfs entries for cgroup.memory.interleave_weights

From: Gregory Price
Date: Wed Nov 08 2023 - 19:25:43 EST


cgroup.memory.interleave_weights is an array of numa node weights
to be used for interleaving when mempolicy utilizes MPOL_F_IL_WEIGHTING.

By default, weights are set to 1, and are only displayed for possible
numa nodes (ones which are or may become online).

Node weights are set individually, and by default are inherited from
the parent cgroup. Inherited weights may be overridden, and overridden
weights may be reverted to inherit from the parent.

Signed-off-by: Gregory Price <gregory.price@xxxxxxxxxxxx>
---
Documentation/admin-guide/cgroup-v2.rst | 45 +++++++++++++++++++
.../admin-guide/mm/numa_memory_policy.rst | 11 +++++
2 files changed, 56 insertions(+)

diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
index b26b5274eaaf..273dbd01a7ec 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -1640,6 +1640,51 @@ PAGE_SIZE multiple when read back.
Shows pressure stall information for memory. See
:ref:`Documentation/accounting/psi.rst <psi>` for details.

+ memory.interleave_weights
+ An array of weights to be used for the interleave mempolicy.
+
+ By default, weights are set to 1, and are only displayed for
+ possible numa nodes (ones which are or may become online).
+
+ Example::
+
+ cat memory.interleave_weights
+ 0:1,1:1
+
+ Here both nodes 0 and 1 are set to weight 1. Node weights are
+ set individually.
+
+ Example::
+
+ echo "0:3" > memory.interleave_weights
+ echo "1:1" > memory.interleave_weights
+
+ Here we set a 3:1 ratio for nodes 0 and 1. Mempolicy will
+ allocate 3 pages on node 0 before allocating 1 page on node 1.
+
+ Child cgroups inherit weights from their parent and may override
+ them or revert back to inheriting the parent weights by writing
+ -1:0 to memory.interleave_weights.
+
+ Example::
+
+ echo "0:3" > parent/memory.interleave_weights
+ echo "1:1" > parent/memory.interleave_weights
+
+ # Child cgroup inherits these weights
+ cat parent/child/memory.interleave_weights
+ 0:3,1:1
+
+ # Override the weights
+ echo "0:5" > parent/child/memory.interleave_weights
+ echo "1:2" > parent/child/memory.interleave_weights
+ cat parent/child/memory.interleave_weights
+ 0:5,1:2
+
+ # Revert the child back to inheriting the parent weights
+ echo "-1:0" > parent/child/memory.interleave_weights
+ cat parent/child/memory.interleave_weights
+ 0:3,1:1

Usage Guidelines
~~~~~~~~~~~~~~~~
diff --git a/Documentation/admin-guide/mm/numa_memory_policy.rst b/Documentation/admin-guide/mm/numa_memory_policy.rst
index eca38fa81e0f..7c82e38dbd2b 100644
--- a/Documentation/admin-guide/mm/numa_memory_policy.rst
+++ b/Documentation/admin-guide/mm/numa_memory_policy.rst
@@ -243,6 +243,17 @@ MPOL_INTERLEAVED
address range or file. During system boot up, the temporary
interleaved system default policy works in this mode.

+ The default interleave behavior is round-robin, however cgroups
+ implement an interleave_weights feature which can be used to
+ change the interleave distribution. When weights are used,
+ the behavior above remains the same, but placement adheres to
+ weights such that multiple allocations will respected the set
+ weights. For example, if the weights for nodes 0 and 1 are
+ 3 and 1 respectively (0:3,1:1), then 3 pages will be allocated
+ on node 0 for every 1 page allocated on node 1.
+
+ For more details, see `Documentation/admin-guide/cgroup-v2.rst`
+
MPOL_PREFERRED_MANY
This mode specifies that the allocation should be preferably
satisfied from the nodemask specified in the policy. If there is
--
2.39.1