Re: [PATCH] mm: add MM_SWAPENTS and page table when calculate tasksize in lowmem_scan()

From: David Rientjes
Date: Tue Feb 16 2016 - 19:35:47 EST


On Tue, 16 Feb 2016, Greg Kroah-Hartman wrote:

> On Tue, Feb 16, 2016 at 05:37:05PM +0800, Xishi Qiu wrote:
> > Currently tasksize in lowmem_scan() only calculate rss, and not include swap.
> > But usually smart phones enable zram, so swap space actually use ram.
>
> Yes, but does that matter for this type of calculation? I need an ack
> from the android team before I could ever take such a core change to
> this code...
>

The calculation proposed in this patch is the same as the generic oom
killer, it's an estimate of the amount of memory that will be freed if it
is killed and can exit. This is better than simply get_mm_rss().

However, I think we seriously need to re-consider the implementation of
the lowmem killer entirely. It currently abuses the use of TIF_MEMDIE,
which should ideally only be set for one thread on the system since it
allows unbounded access to global memory reserves.

It also abuses the user-visible /proc/self/oom_score_adj tunable: this
tunable is used by the generic oom killer to bias or discount a proportion
of memory from a process's usage. This is the only supported semantic of
the tunable. The lowmem killer uses it as a strict prioritization, so any
process with oom_score_adj higher than another process is preferred for
kill, REGARDLESS of memory usage. This leads to priority inversion, the
user is unable to always define the same process to be killed by the
generic oom killer and the lowmem killer. This is what happens when a
tunable with a very clear and defined purpose is used for other reasons.

I'd seriously consider not accepting any additional hacks on top of this
code until the implementation is rewritten.