Re: Some questions about linux kernel.

From: James Sutherland (jas88@cam.ac.uk)
Date: Wed Mar 15 2000 - 10:32:03 EST


On Wed, 15 Mar 2000, Sean Hunter wrote:

> On Tue, Mar 14, 2000 at 02:40:27PM -0300, Rik van Riel wrote:
> > On Mon, 13 Mar 2000, Peter T. Breuer wrote:
> >
> > > > For the time being, though, the simple "kill the most likely suspect"
> > > > patch should be a pretty good approximation.
> > >
> > > Nope. It isn't. It's unacceptable in practice.
> > >
> > > I run labs. The lab machines have to have defenses against
> > > _themselves_.
> > >
> > > Even so, I get reports every couple of days from machines that
> > > have obviously killed themselves.
> >
> > You don't run the OOM killer. You haven't tried it.
> > Yet you want to judge it without any knowledge about
> > it...
>
> Could I just stress one point which Rik and others keep making and
> people keep ignoring? The in-kernel OOM killer is not (meant to be) a
> finely-honed user-customisable tool. Its a very effective thing when
> the memory state is completely dire. It should never run, and when it
> does, its the last-ditch defence and if it wasn't there your system
> would die anyway.

Precisely - complaining that you would rather have your box die
uncontrollably, because you are sitting there holding its hand 24x7 to
watch the lights go out, is hardly productive. If you ARE monitoring it
closely enough to fix things, you would fix them long before the disaster
recovery OOM routine cuts in!

> If you want something clever and/or subtle, you need to run it in
> userland _as well_ as the in-kernel oom killer, and if the userland
> thingy does its job, the in-kernel killer should never need to run.

Plenty of people have CPU leak detectors, which will renice (and later
kill) processes which take more than a certain length of time. This is an
absolute godsend for stopping things like Netscape which have a horrible
habit of going into a busy loop, eating 100% of CPU time until killed.

Something similar to implement crude per-user quotas could be useful -
just check periodically how much RAM each user is using, and tell them if
they exceed their quota (a "soft" quota) - would probably address the
"normal" low-memory situation (exploding Netscapes etc.)

> I think Rik gets a ridiculous amount of flak from people who can't be
> bothered to try his patch or listen to what he and others are trying
> to do, but want to mouth off anyway.

Agreed. Complaining about Rik's patch "killing my poor defenceless
process, which had blown up for a good reason - I really DID want it to
exhaust all my swapspace and then kill itself, not be killed by the big
bad nasty kernel!" is ridiculous. The processes it kills will almost
always be killed because they are causing serious problems for the system.
A process only gets killed in this way to protect overall system
integrity, not as a matter of routine housekeeping.

James.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Thu Mar 23 2000 - 21:00:16 EST