Re: OOM band-aid

Rik van Riel (H.H.vanRiel@phys.uu.nl)
Thu, 8 Oct 1998 20:16:01 +0200 (CEST)


On Thu, 8 Oct 1998, Clayton Weaver wrote:

> Not to belittle anyone's efforts to get more efficient use of the
> system and stave off the big implosion, but the "kill something,
> glean some stale pages that we don't really need but that are still
> marked as in use" seems kind of stop-gap to me. It has this
> vagueness and ambiguity to it. Maybe that's just because I'm not
> reading the vm code while I think about it, but "what to kill" seems
> to me insoluble.

Please look at my patch. I made up a somewhat simple but
effective algorithm that seems to kill the right process
9 out of 10 times on all the testers' machines.

> This seems to me a sort of poking at it with a stick blindfolded
> solution.

Not really, you have quite a lot of data available to
do your choosing. Program size, CPU time used, time
running, (non-)suid, root, IOPL, etc...

> What about a dynamic swap file with no size limit other than
> partition space wherever /var/tmp or /tmp live?

This means suspending the problem until you hit that limit.
Ie: it's not a real solution.

> All you do with it is suspend things into it. Set a limit where it
> kicks in (90% of ram + fixed swap in use, whatever), and start
> suspending user space processes until the usage stays under that
> level for some timer count.

Even worse, a suspended process can't finish and clean
up itself. We've had this discussion a dozen times, and
process suspension has always come out as the worst
option for any unix-like system.

> If you kill one accidentally by suspending it, that isn't any worse
> than if your OOM algorithm killed it on purpose,

Not true, suspending it doesn't save the system because you
still haven't got rid of it.

> and you will end up saving a lot of people's work that the algorithm
> might have lost otherwise.

People who run huge simulations will accept the fact that
their process will be killed if it behaves badly. Note,
however, that my OOM killing code also takes into account
CPU time used and time run, so it won't kill a good-behaving,
old simulation when there's a new exploding-netscape in
sight...

> You can still get to the point where you run out of disk space on
> your designated failsafe dynamic swap file partition,

And what are you going to do then?

Rik.
+-------------------------------------------------------------------+
| Linux memory management tour guide. H.H.vanRiel@phys.uu.nl |
| Scouting Vries cubscout leader. http://www.phys.uu.nl/~riel/ |
+-------------------------------------------------------------------+

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/