Re: [PATCH] mm: add MM_SWAPENTS and page table when calculate tasksize in lowmem_scan()

From: David Rientjes
Date: Wed Feb 17 2016 - 17:42:48 EST


On Wed, 17 Feb 2016, Xishi Qiu wrote:

> Hi David,
>
> Thanks for your advice.
>
> I have a stupid question, what's the main difference between lmk and oom?

Hi Xishi, it's not a stupid question at all!

Low memory killer appears to be implemented as a generic shrinker that
iterates through the tasklist and tries to free memory before the generic
oom killer. It has two tunables, "adj" and "minfree": "minfree" describes
what class of processes are eligible based on how many free pages are left
on the system and "adj" defines that class by using oom_score_adj values.

So LMK is trying to free memory before all memory is depleted based on
heuristics for systems that load the driver whereas the generic oom killer
is called to kill a process when reclaim has failed to free any memory and
there's no forward progress.

> 1) lmk is called when reclaim memory, and oom is called when alloc failed in slow path.

Yeah, and I don't think LMK provides any sort of guarantee against all
memory being fully depleted before it can run, so it would probably be
best effort.

> 2) lmk has several lowmem thresholds and oom is not.

Right, and it abuses oom_score_adj, which is a generic oom killer tunable
to define priorities to kill at different levels of memory availability.

> 3) others?
>

LMK also abuses TIF_MEMDIE which is used by the generic oom killer to
allow a process to free memory. Since the system is out of memory when it
is called, a process often needs additional memory to even exit, so we set
TIF_MEMDIE to ignore zone watermarks in the page allocator. LMK should
not be using this, there should already be memory available for it to
allocate from.

To fix these issues with LMK, I think it should:

- send SIGKILL to terminate a process in lowmem situations, but not
set TIF_MEMDIE and implement its own way of determining when to kill
additional processes, and

- introduce its own tunable to define the priority of kill when it runs
rather than oom_score_adj, which is a proportion of memory to bias
against, not a priority at all.