[RFC PATCH] Android OOM helper proof of concept

From: peter enderborg
Date: Thu Apr 22 2021 - 08:33:56 EST


On 4/21/21 4:29 PM, Michal Hocko wrote:
> On Wed 21-04-21 06:57:43, Shakeel Butt wrote:
>> On Wed, Apr 21, 2021 at 12:16 AM Michal Hocko <mhocko@xxxxxxxx> wrote:
>> [...]
>>>> To decide when to kill, the oom-killer has to read a lot of metrics.
>>>> It has to open a lot of files to read them and there will definitely
>>>> be new allocations involved in those operations. For example reading
>>>> memory.stat does a page size allocation. Similarly, to perform action
>>>> the oom-killer may have to read cgroup.procs file which again has
>>>> allocation inside it.
>>> True but many of those can be avoided by opening the file early. At
>>> least seq_file based ones will not allocate later if the output size
>>> doesn't increase. Which should be the case for many. I think it is a
>>> general improvement to push those who allocate during read to an open
>>> time allocation.
>>>
>> I agree that this would be a general improvement but it is not always
>> possible (see below).
> It would be still great to invest into those improvements. And I would
> be really grateful to learn about bottlenecks from the existing kernel
> interfaces you have found on the way.
>
>>>> Regarding sophisticated oom policy, I can give one example of our
>>>> cluster level policy. For robustness, many user facing jobs run a lot
>>>> of instances in a cluster to handle failures. Such jobs are tolerant
>>>> to some amount of failures but they still have requirements to not let
>>>> the number of running instances below some threshold. Normally killing
>>>> such jobs is fine but we do want to make sure that we do not violate
>>>> their cluster level agreement. So, the userspace oom-killer may
>>>> dynamically need to confirm if such a job can be killed.
>>> What kind of data do you need to examine to make those decisions?
>>>
>> Most of the time the cluster level scheduler pushes the information to
>> the node controller which transfers that information to the
>> oom-killer. However based on the freshness of the information the
>> oom-killer might request to pull the latest information (IPC and RPC).
> I cannot imagine any OOM handler to be reliable if it has to depend on
> other userspace component with a lower resource priority. OOM handlers
> are fundamentally complex components which has to reduce their
> dependencies to the bare minimum.


I think we very much need a OOM killer that can help out,
but it is essential that it also play with android rules.

This is RFC patch that interact with OOM