Re: [PATCH] memcg: Default value setting in memcg-v1

From: Michal Hocko
Date: Tue Apr 11 2023 - 05:11:13 EST


On Thu 06-04-23 16:14:50, Shaun Tancheff wrote:
> From: Shaun Tancheff <shaun.tancheff@xxxxxxx>
>
> Setting min, low and high values with memcg-v1
> provides bennefits for users that are unable to update
> to memcg-v2.

min, low and high limits are cgroup v2 concepts which are not a fit for
v1 implementation. The primary reason why v2 interface has been created
was that existing v1 interfaces and internal constrains (most
notably soft limit and tasks in inter nodes for memcg) were not
reformable. It is really hard to define a proper semantic for memory
protection when inter node tasks can compete with hierarchy beneath.

> Setting min, low and high can be set in memcg-v1
> to apply enough memory pressure to effective throttle
> filesystem I/O without hitting memcg oom.

This is not a proper way to achieve that. As I've already state in the
previous submission of a similar patch
(20230330202232.355471-1-shaun.tancheff@xxxxxxxxx), cgroup v1 dirty data
throttling has some downsides because it cannot effectively throttle
GFP_NOFS allocations. One way around that is to reduce the dirty data
limit to prevent from over dirty memcg LRUs. I would recommend to move
forward to cgroup v2 though.

> This can be enabled by setting the sysctl values:
> vm.memcg_v1_min_default
> vm.memcg_v1_low_default
> vm.memcg_v1_high_default
>
> When a memory control group is newly crated the
> min, low and high values are set to percent of the
> maximum based on the min, low and high default
> values respectively.

This also looks like an anti-pattern in the cgroup world. For two
reasons. First of all min, low (reclaim protection) is hierarchical and
global default value makes a very little sense for anything than flat
hierarchies and even then it makes it really easy to misconfigure system
too easily.
Also percentage is a very suboptimal interface in general as the
granularity is just too coarse for anything than small limits.

> This resolves an issue with memory pressure when users
> initiate unbounded I/O on various file systems such as
> ext4, XFS and NFS.

Filesystems should still be controllable by dirty limits. This might
lead to a suboptimal IO throughput but this might be a better workaround
if you cannot afford to move to cgroup v2. V1 interface is considered
legacy and support is limited. New features are only added if there
absolutely is not other way around to keep legacy applications running.

HTH
--
Michal Hocko
SUSE Labs