Re: [RFC PATCH v4 0/3] memcg weighted interleave mempolicy control

From: Gregory Price
Date: Thu Nov 09 2023 - 11:34:48 EST


On Thu, Nov 09, 2023 at 11:02:23AM +0100, Michal Hocko wrote:
> On Wed 08-11-23 19:25:14, Gregory Price wrote:
> > This patchset implements weighted interleave and adds a new cgroup
> > sysfs entry: cgroup/memory.interleave_weights (excluded from root).
>
> Why have you chosen memory controler rather than cpuset controller?
> TBH I do not think memcg is the best fit because traditionally memcg
> accounts consumption rather than memory placement. This means that the
> memory is already allocated when it is charged for a memcg. On the other
> hand cpuset controller is the one to control the allocation placement so
> it would seem a better fit.
> --
> Michal Hocko
> SUSE Labs

Actually going to walk back my last email, memcg actually feels more
correct than cpuset, if only because of what the admin-guide says:

"""
The "memory" controller regulates distribution of memory. [... snip ...]

While not completely water-tight, all major memory usages by a given
cgroup are tracked so that the total memory consumption can be accounted
and controlled to a reasonable extent.
"""

'And controlled to a reasonable extent' seems to fit the description of
this mechanism better than the cpuset description:

"""
The "cpuset" controller provides a mechanism for constraining the CPU and
memory node placement of tasks to only the resources specified in the
cpuset interface files in a task's current cgroup.
"""

This is not a constraining interface... it's "more of a suggestion". In
particular, anything not using interleave doesn't even care about these
weights at all.

The distribution is only enforced for allocation, it does not cause
migrations... thought that would be a neat idea. This is explicitly
why the interface does not allow a weight of 0 (the not should be
omitted from the policy nodemask or cpuset instead).

Even if this were designed to enforce a particular distribution of
memory, I'm not certain that would belong in cpusets either - but I
suppose that is a separate discussion. It's possible this array of
weights could be used to do both, but it seems (at least on the surface)
that making this a hard control is an excellent way to induce OOMs where
you may not want them.


Anyway, summarizing: After a bit of reading, this does seem to map
better to the "accounting consumption" subsystem than the "constrain"
subsystem. However, if you think it's better suited for cpuset, I'm
happy to push in that direction.

~Gregory