Re: [RFC PATCH 0/5] mm: Select victim memcg using BPF_OOM_POLICY

From: Michal Hocko
Date: Mon Jul 31 2023 - 09:24:07 EST


On Mon 31-07-23 14:00:22, Chuyi Zhou wrote:
> Hello, Michal
>
> 在 2023/7/28 01:23, Michal Hocko 写道:
[...]
> > This sounds like a very specific oom policy and that is fine. But the
> > interface shouldn't be bound to any concepts like priorities let alone
> > be bound to memcg based selection. Ideally the BPF program should get
> > the oom_control as an input and either get a hook to kill process or if
> > that is not possible then return an entity to kill (either process or
> > set of processes).
>
> Here are two interfaces I can think of. I was wondering if you could give me
> some feedback.
>
> 1. Add a new hook in select_bad_process(), we can attach it and return a set
> of pids or cgroup_ids which are pre-selected by user-defined policy,
> suggested by Roman. Then we could use oom_evaluate_task to find a final
> victim among them. It's user-friendly and we can offload the OOM policy to
> userspace.
>
> 2. Add a new hook in oom_evaluate_task() and return a point to override the
> default oom_badness return-value. The simplest way to use this is to protect
> certain processes by setting the minimum score.
>
> Of course if you have a better idea, please let me know.

Hooking into oom_evaluate_task seems the least disruptive to the
existing oom killer implementation. I would start by planing with that
and see whether useful oom policies could be defined this way. I am not
sure what is the best way to communicate user input so that a BPF prgram
can consume it though. The interface should be generic enough that it
doesn't really pre-define any specific class of policies. Maybe we can
add something completely opaque to each memcg/task? Does BPF
infrastructure allow anything like that already?

--
Michal Hocko
SUSE Labs