Re: Improving OOM killer

From: David Rientjes
Date: Thu Feb 04 2010 - 16:49:13 EST


On Wed, 3 Feb 2010, Rik van Riel wrote:

> > Do you have any comments about the forkbomb detector or its threshold that
> > I've put in my heuristic? I think detecting these scenarios is still an
> > important issue that we need to address instead of simply removing it from
> > consideration entirely.
>
> I believe that malicious users are best addressed in person,
> or preemptively through cgroups and rlimits.
>

Forkbombs need not be the result of malicious users.

> Having a process with over 500 children is quite possible
> with things like apache, Oracle, postgres and other forking
> daemons.
>

It's clear that the forkbomb threshold would need to be definable from
userspace and probably default to something high such as 1000.

Keep in mind that we're in the oom killer here, though. So we're out of
memory and we need to kill something; should Apache, Oracle, and postgres
not be penalized for their cost of running by factoring in something like
this?

(lowest rss size of children) * (# of first-generation children) /
(forkbomb threshold)

> Killing the parent process can result in the service
> becoming unavailable, and in some cases even data
> corruption.
>

There's only one possible rememdy for that, which is OOM_DISABLE; the oom
killer cannot possibly predict data corruption as the result of killing a
process and this is no different. Everything besides init, kthreads,
OOM_DISABLE threads, and threads that do not share the same cpuset, memcg,
or set of allowed mempolicy nodes are candidates for oom kill.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/