Re: Killing/balancing processes when overcommited

From: Helge Hafting (helgehaf@aitel.hist.no)
Date: Thu Sep 12 2002 - 03:26:30 EST


Jurriaan wrote:
>
> From: Jim Sibley <jlsibley@us.ibm.com>
> Date: Wed, Sep 11, 2002 at 11:08:43AM -0700
> > 1 - cpu usage may not be a good measure
> > 2 - Large memory tasks may not be a good measure
> > 3 - Measuring memory by task is misleading
> > 4 - Niceness is not really useful in a multi-user environment.
> > 5 - Other numerical limits tend to be arbitrary.
>
> I was just think (feel free to point out the errors of my way):
>
> what if we used the time a program was started as a guide? The last
> programs started are killed of first.
>
> That would mean that init survives to the last, as would the daemons
> that are started when booting.

And if one of your daemons has a slow memory leak then this happens:
You go OOM after a while (hours, days) - a user program is killed.
Buth the leaky dameon is running, so after a shorter time you go OOM
again.
Another user program is killed. This goes on for a while, it becomes
hard
to log in to fix things because the freshly logged in administrator
has a very new process and is the first to go!

After a while, all user programs are gone and daemons die one by one
until the offending one goes. Or perhaps the offending damon
don't leak anymore - it might be sshd but there is not enough memory
to log in so it don't get to leak any more.
>
> Alternatively, suppose we get a very large pid-space, and at the end of
> booting there's something like
>
> echo "5000" > /proc/sys/minimum-pid-from-here-on
>
> Then, you could do:
>
> echo "5000" > proc/sys/oom_lowest_pid_to_try_killing_first

Again, a bad daemon (pid < 5000) will slowly take out everything else,
with login impossible in the meantime.

> in other words, protect a part of pid-space against oom-killing.

Any way you protect a bunch of processes might fail if the bad one
is among them. Also, the OOM killer will have to fall back
to the standard heuristic whenever there is only protected
processes left.

Helge Hafting
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Sun Sep 15 2002 - 22:00:28 EST