Re: [RFC PATCH v4 0/3] memcg weighted interleave mempolicy control

From: Gregory Price
Date: Tue Nov 14 2023 - 10:51:09 EST


On Tue, Nov 14, 2023 at 10:43:13AM +0100, Michal Hocko wrote:
> On Fri 10-11-23 22:42:39, Gregory Price wrote:
> [...]
> > If I can ask, do you think it would be out of line to propose a major
> > refactor to mempolicy to enable external task's the ability to change a
> > running task's mempolicy *as well as* a cgroup-wide mempolicy component?
>
> No, I actually think this is a reasonable idea. pidfd_setmempolicy is a
> generally useful extension. The mempolicy code is heavily current task
> based and there might be some challenges but I believe this will a)
> improve the code base and b) allow more usecases.

Just read up on the pidfd_set_mempolicy lore, and yes I'm seeing all the
same problems (I know there was discussion of vma policies, but i think
that can be a topic for later). Have some thoughts on this, but will
take some time to work through a few refactoring tickets first.

>
> That being said, I still believe that a cgroup based interface is a much
> better choice over a global one. Cpusets seem to be a good fit as the
> controller does control memory placement wrt NUMA interfaces.

I think cpusets is a non-starter due to the global spinlock required when
reading informaiton from it:

https://elixir.bootlin.com/linux/latest/source/kernel/cgroup/cpuset.c#L391

Unless the proposal is to place the weights as a global cgroups value,
in which case I think it would be better placed in default_mempolicy :]