Re: Re: [RFC PATCH 1/2] mm, oom: Introduce bpf_select_task

From: Abel Wu
Date: Thu Aug 10 2023 - 00:01:15 EST


On 8/9/23 3:53 PM, Michal Hocko wrote:
On Tue 08-08-23 14:41:17, Roman Gushchin wrote:
It would be also nice to come up with some practical examples of bpf programs.
What are meaningful scenarios which can be covered with the proposed approach
and are not covered now with oom_score_adj.

Agreed here as well. This RFC serves purpose of brainstorming on all of
this.

There is a fundamental question whether we need BPF for this task in the
first place. Are there any huge advantages to export the callback and
allow a kernel module to hook into it?

The ancient oom-killer largely depends on memory usage when choosing
victims, which might not fit the need of modern scenarios. It's common
nowadays that multiple workloads (tenants) with different 'priorities'
run together, and the decisions made by the oom-killer doesn't always
obey the service level agreements.

While the oom_score_adj only adjusts the usage-based decisions, so it
can hardly be translated into 'priority' semantic. How can we properly
configure it given that we don't know how much memory the workloads
will use? It's really hard for a static strategy to deal with dynamic
provision. IMHO the oom_score_adj is just another demon.

Reworking the oom-killer's internal algorithm or patching some random
metrics may satisfy the immediate needs, but for the next 10 years? I
doubt it. So I think we do need the flexibility to bypass the legacy
usage-based algorithm, through bpf or pre-select interfaces.

Regards,
Abel