Re: [PATCH] mm: memcontrol: protect the memory in cgroup from being oom killed

From: Michal Hocko
Date: Wed Nov 30 2022 - 08:15:13 EST


On Wed 30-11-22 15:01:58, chengkaitao wrote:
> From: chengkaitao <pilgrimtao@xxxxxxxxx>
>
> We created a new interface <memory.oom.protect> for memory, If there is
> the OOM killer under parent memory cgroup, and the memory usage of a
> child cgroup is within its effective oom.protect boundary, the cgroup's
> tasks won't be OOM killed unless there is no unprotected tasks in other
> children cgroups. It draws on the logic of <memory.min/low> in the
> inheritance relationship.

Could you be more specific about usecases? How do you tune oom.protect
wrt to other tunables? How does this interact with the oom_score_adj
tunining (e.g. a first hand oom victim with the score_adj 1000 sitting
in a oom protected memcg)?

I haven't really read through the whole patch but this struck me odd.

> @@ -552,8 +552,19 @@ static int proc_oom_score(struct seq_file *m, struct pid_namespace *ns,
> unsigned long totalpages = totalram_pages() + total_swap_pages;
> unsigned long points = 0;
> long badness;
> +#ifdef CONFIG_MEMCG
> + struct mem_cgroup *memcg;
>
> - badness = oom_badness(task, totalpages);
> + rcu_read_lock();
> + memcg = mem_cgroup_from_task(task);
> + if (memcg && !css_tryget(&memcg->css))
> + memcg = NULL;
> + rcu_read_unlock();
> +
> + update_parent_oom_protection(root_mem_cgroup, memcg);
> + css_put(&memcg->css);
> +#endif
> + badness = oom_badness(task, totalpages, MEMCG_OOM_PROTECT);

the badness means different thing depending on which memcg hierarchy
subtree you look at. Scaling based on the global oom could get really
misleading.

--
Michal Hocko
SUSE Labs