Re: [PATCH v4] mm: oom: introduce cpuset oom

From: Michal Hocko
Date: Tue Apr 11 2023 - 10:36:36 EST


On Tue 11-04-23 14:58:15, Gang Li wrote:
> Cpusets constrain the CPU and Memory placement of tasks.
> `CONSTRAINT_CPUSET` type in oom has existed for a long time, but
> has never been utilized.
>
> When a process in cpuset which constrain memory placement triggers
> oom, it may kill a completely irrelevant process on other numa nodes,
> which will not release any memory for this cpuset.
>
> We can easily achieve node aware oom by using `CONSTRAINT_CPUSET` and
> selecting victim from cpusets with the same mems_allowed as the
> current one.

I believe it still wouldn't hurt to be more specific here.
CONSTRAINT_CPUSET is rather obscure. Looking at this just makes my head
spin.
/* Check this allocation failure is caused by cpuset's wall function */
for_each_zone_zonelist_nodemask(zone, z, oc->zonelist,
highest_zoneidx, oc->nodemask)
if (!cpuset_zone_allowed(zone, oc->gfp_mask))
cpuset_limited = true;

Does this even work properly and why? prepare_alloc_pages sets
oc->nodemask to current->mems_allowed but the above gives us
cpuset_limited only if there is at least one zone/node that is not
oc->nodemask compatible. So it seems like this wouldn't ever get set
unless oc->nodemask got reset somewhere. This is a maze indeed. Is there
any reason why we cannot rely on __GFP_HARWALL here? Or should we
instead rely on the fact the nodemask should be same as
current->mems_allowed?

I do realize that this is not directly related to your patch but
considering this has been mostly doing nothing maybe we want to document
it better or even rework it at this occasion.

> Example:
>
> Create two processes named mem_on_node0 and mem_on_node1 constrained
> by cpusets respectively. These two processes alloc memory on their
> own node. Now node0 has run out of memory, OOM will be invokled by
> mem_on_node0.

Don't you have an actual real life example with a properly partitioned
system which clearly misbehaves and this patch addresses that?
--
Michal Hocko
SUSE Labs