Re: [PATCH v2 1/2] hugetlb: memcg: account hugetlb-backed memory in memory controller

From: Johannes Weiner
Date: Mon Oct 02 2023 - 10:50:33 EST


On Mon, Oct 02, 2023 at 03:43:19PM +0200, Michal Hocko wrote:
> On Wed 27-09-23 17:57:22, Nhat Pham wrote:
> > Currently, hugetlb memory usage is not acounted for in the memory
> > controller, which could lead to memory overprotection for cgroups with
> > hugetlb-backed memory. This has been observed in our production system.
> >
> > This patch rectifies this issue by charging the memcg when the hugetlb
> > folio is allocated, and uncharging when the folio is freed (analogous to
> > the hugetlb controller).
>
> This changelog is missing a lot of information. Both about the usecase
> (we do not want to fish that out from archives in the future) and the
> actual implementation and the reasoning behind that.
>
> AFAICS you have decided to charge on the hugetlb use rather than hugetlb
> allocation to the pool. I suspect the underlying reasoning is that pool
> pages do not belong to anybody. This is a deliberate decision and it
> should be documented as such.
>
> It is also very important do describe subtle behavior properties that
> might be rather unintuitive to users. Most notably
> - there is no hugetlb pool management involved in the memcg
> controller. One has to use hugetlb controller for that purpose.
> Also the pre allocated pool as such doesn't belong to anybody so the
> memcg host overcommit management has to consider it when configuring
> hard limits.

+1

> - memcg limit reclaim doesn't assist hugetlb pages allocation when
> hugetlb overcommit is configured (i.e. pages are not consumed from the
> pool) which means that the page allocation might disrupt workloads
> from other memcgs.
> - failure to charge a hugetlb page results in SIGBUS rather
> than memcg oom killer. That could be the case even if the
> hugetlb pool still has pages available and there is
> reclaimable memory in the memcg.

Are these actually true? AFAICS, regardless of whether the page comes
from the pool or the buddy allocator, the memcg code will go through
the regular charge path, attempt reclaim, and OOM if that fails.