Re: Re: [PATCH v4] mm: oom: introduce cpuset oom

From: Gang Li
Date: Tue Apr 11 2023 - 09:05:38 EST




On 2023/4/11 20:23, Michal Koutný wrote:
Hello.

On Tue, Apr 11, 2023 at 02:58:15PM +0800, Gang Li <ligang.bdlg@xxxxxxxxxxxxx> wrote:
+ cpuset_for_each_descendant_pre(cs, pos_css, &top_cpuset) {
+ if (nodes_equal(cs->mems_allowed, task_cs(current)->mems_allowed)) {
+ css_task_iter_start(&(cs->css), CSS_TASK_ITER_PROCS, &it);
+ while (!ret && (task = css_task_iter_next(&it)))
+ ret = fn(task, arg);
+ css_task_iter_end(&it);
+ }
+ }
+ rcu_read_unlock();
+ cpuset_read_unlock();
+ return ret;
+}

I see this traverses all cpusets without the hierarchy actually
mattering that much. Wouldn't the CONSTRAINT_CPUSET better achieved by
globally (or per-memcg) scanning all processes and filtering with:

Oh I see, you mean scanning all processes in all cpusets and scanning
all processes globally are equivalent.

nodes_intersect(current->mems_allowed, p->mems_allowed

Perhaps it would be better to use nodes_equal first, and if no suitable
victim is found, then downgrade to nodes_intersect?

NUMA balancing mechanism tends to keep memory on the same NUMA node, and
if the selected victim's memory happens to be on a node that does not
intersect with the current process's node, we still won't be able to
free up any memory.

In this example:

A->mems_allowed: 0,1
B->mems_allowed: 1,2
nodes_intersect(A->mems_allowed, B->mems_allowed) == true

Memory Distribution:
+=======+=======+=======+
| Node0 | Node1 | Node2 |
+=======+=======+=======+
| A | | |
+-------+-------+-------+
| | |B |
+-------+-------+-------+

Process A invoke oom, then kill B.
But A still can't get any free mem on Node0 and 1.

(`current` triggers the OOM, `p` is the iterated task)
?

Thanks,
Michal