Re: Some questions about linux kernel.

From: James Sutherland (jas88@cam.ac.uk)
Date: Mon Mar 13 2000 - 06:42:58 EST


On Mon, 13 Mar 2000, Andrea Arcangeli wrote:

> On Sun, 12 Mar 2000, Rik van Riel wrote:
>
> >You might want to take a look at the process selection
> >mechanism in my OOM killer patch (http://www.surriel.com/patches/).
>
> I read the process selection of the oom code in 2.2.15pre12 and it can be
> described this way:
>
> "try _not_ to kill tasks that are been started lots of time ago,
> that used lots of CPU resources, that are running
> non reniced and that are running as roo or with privilegies"
>
> The heuristic has _no-way_ to find out which is the hog. This in turn mean
> that you can kill several wrong tasks before you finally kill the right
> one and so it's useless and worse than what we have now as far I can tell.

Hardly. The current situation amounts to "Kill processes completely
randomly until we hit the right one."

> If you hit a feature in a daemon that incidentally causes it to grow at
> maximal rate you'll end killing lots of innocents for no good reason.

At present, yes. WITH the patch in, it is likely to be killed very soon.

> Your object is defensive "you try not to kill" something that looks more
> important.
>
> I instead believe that we should be aggressive against the _hog_ instead
> of being defensive against some task that may look as non malicious (but
> that you don't know to be innocent).
>
> Once we'll be able to find out which is the hog there will be no need to
> look the informations that you are using in your task-selection
> algorithm. We know we have to kill the hog, despite of its
> euid/priority/lifetime etc...

But HOW do we identify the "hog"? It may well not even be an individual
process - it could be a group of processes.

> The idea I had a few weeks ago to solve the problem and so to find out the
> hog (and that I'll experiment in real life 2.3.x soon) is to add a
> per-task page fault rate (ala avg_slice). Once we'll know the page fault
> rate and the time of the last fault per each process, we'll be almost able
> to find out the memory hog without possible mistakes and we won't need
> anything else.

Yes, once we can use the rate of allocation rather than the absolute
volume, we are more likely to hit the problem task earlier on.
For now, however, killing the biggest task (with a few changes - avoiding
killing root processes if possible, etc.) is a reasonable approximation.

> I completly agree with James (I quote him):
>
> "When you OOM, it's (typically) the case where it's a single process
> that's going crazy and being a huge memory hog. Killing other
> processes ahead of it won't typically mean very much, as you'll
> ^^^^^^
> have to kill more until you finally get to the
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> spiraling-out-of-control memory hog."
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>
> >> Bottom line is that I don't belive any kind of AI in OMM will do
> >> the right job.
> >
> >So give us a better solution. One that also works for
>
> The task-selection algorithm have nothing to do with AI. It doesn't know
> anything about the past and it is not going to learn anything at runtime.

True. The present system bludgeons random processes until it hits the
right one; the modified code uses a slightly more sensible algorithm to
pick which to kill first. Yes, it could be improved upon further with more
information about processes, but we don't have that yet.

James.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Wed Mar 15 2000 - 21:00:26 EST