Re: [PATCH v3 2/3] hugetlb: memcg: account hugetlb-backed memory in memory controller

From: Johannes Weiner
Date: Tue Oct 03 2023 - 08:54:50 EST


On Mon, Oct 02, 2023 at 05:18:27PM -0700, Nhat Pham wrote:
> Currently, hugetlb memory usage is not acounted for in the memory
> controller, which could lead to memory overprotection for cgroups with
> hugetlb-backed memory. This has been observed in our production system.
>
> For instance, here is one of our usecases: suppose there are two 32G
> containers. The machine is booted with hugetlb_cma=6G, and each
> container may or may not use up to 3 gigantic page, depending on the
> workload within it. The rest is anon, cache, slab, etc. We can set the
> hugetlb cgroup limit of each cgroup to 3G to enforce hugetlb fairness.
> But it is very difficult to configure memory.max to keep overall
> consumption, including anon, cache, slab etc. fair.
>
> What we have had to resort to is to constantly poll hugetlb usage and
> readjust memory.max. Similar procedure is done to other memory limits
> (memory.low for e.g). However, this is rather cumbersome and buggy.
> Furthermore, when there is a delay in memory limits correction, (for e.g
> when hugetlb usage changes within consecutive runs of the userspace
> agent), the system could be in an over/underprotected state.
>
> This patch rectifies this issue by charging the memcg when the hugetlb
> folio is utilized, and uncharging when the folio is freed (analogous to
> the hugetlb controller). Note that we do not charge when the folio is
> allocated to the hugetlb pool, because at this point it is not owned by
> any memcg.
>
> Some caveats to consider:
> * This feature is only available on cgroup v2.
> * There is no hugetlb pool management involved in the memory
> controller. As stated above, hugetlb folios are only charged towards
> the memory controller when it is used. Host overcommit management
> has to consider it when configuring hard limits.
> * Failure to charge towards the memcg results in SIGBUS. This could
> happen even if the hugetlb pool still has pages (but the cgroup
> limit is hit and reclaim attempt fails).
> * When this feature is enabled, hugetlb pages contribute to memory
> reclaim protection. low, min limits tuning must take into account
> hugetlb memory.
> * Hugetlb pages utilized while this option is not selected will not
> be tracked by the memory controller (even if cgroup v2 is remounted
> later on).
>
> Signed-off-by: Nhat Pham <nphamcs@xxxxxxxxx>

Acked-by: Johannes Weiner <hannes@xxxxxxxxxxx>