Re: [PATCH v2 3/3] mm/page_owner: Dump memcg information

From: Waiman Long
Date: Wed Feb 02 2022 - 11:29:50 EST


On 2/2/22 03:57, Michal Hocko wrote:
On Tue 01-02-22 11:41:19, Waiman Long wrote:
On 2/1/22 05:49, Michal Hocko wrote:
[...]
Could you be more specific? Offlined memcgs are still part of the
hierarchy IIRC. So it shouldn't be much more than iterating the whole
cgroup tree and collect interesting data about dead cgroups.
What I mean is that without piggybacking on top of page_owner, we will to
add a lot more code to collect and display those information which may have
some overhead of its own.
Yes, there is nothing like a free lunch. Page owner is certainly a tool
that can be used. My main concern is that this tool doesn't really
scale on large machines with a lots of memory. It will provide a very
detailed information but I am not sure this is particularly helpful to
most admins (why should people process tons of allocation backtraces in
the first place). Wouldn't it be sufficient to have per dead memcg stats
to see where the memory sits?

Accumulated offline memcgs is something that bothers more people and I
am really wondering whether we can do more for those people to evaluate
the current state.

You won't get the stack backtrace information without page_owner enabled. I believe that is a helpful piece of information. I don't expect page_owner to be enabled by default on production system because of its memory overhead.

I believe you can actually see the number of memory cgroups present by looking at the /proc/cgroups file. Though, you don't know how many of them are offline memcgs. So if one suspect that there are a large number of offline memcgs, one can set up a test environment with page_owner enabled for further analysis.

Cheers,
Longman