Re: [RFC PATCH 1/2] mm, oom: Introduce bpf_select_task

From: Martin KaFai Lau
Date: Thu Aug 10 2023 - 15:41:15 EST


First, I'm a bit concerned about implicit restrictions we apply to bpf programs
which will be executed potentially thousands times under a very heavy memory
pressure. We will need to make sure that they don't allocate (much) memory, don't
take any locks which might deadlock with other memory allocations etc.
It will potentially require hard restrictions on what these programs can and can't
do and this is something that the bpf community will have to maintain long-term.

Right, BPF callbacks operating under OOM situations will be really
constrained but this is more or less by definition. Isn't it?

What do you mean?

Callbacks cannot depend on any direct or indirect memory allocations.
Dependencies on any sleeping locks (again directly or indirectly) is not
allowed just to name the most important ones.

In general, the bpf community is trying to make it as generic as possible and
adding new and new features. Bpf programs are not as constrained as they were
when it's all started.

bpf supports different running context. For example, only non-sleepable bpf prog is allowed to run at the NIC driver. A sleepable bpf prog is only allowed to run at some bpf_lsm hooks that is known to be safe to call blocking bpf-helper/kfunc. From the bpf side, it ensures a non-sleepable bpf prog cannot do things that may block.

fwiw, Dave has recently proposed something for iterating the task vma (https://lore.kernel.org/bpf/20230810183513.684836-4-davemarchevsky@xxxxxx/). Potentially, a similar iterator can be created for a bpf program to iterate cgroups and tasks.