Re: Terminate process that fails on a constrained allocation

From: Christoph Lameter
Date: Wed Feb 08 2006 - 14:01:04 EST


On Wed, 8 Feb 2006, Paul Jackson wrote:

> > F.e. a sysadmin may mistakenly start a process allocating too much memory
> > in a cpuset. The OOM killer will then start randomly shooting other
> > processes one of which may be a critical process.. Ouch.
>
> That same exact claim could be made to justify a patch that
> removed mm/oom_kill.c entirely. And the same exact claim
> could be made, dropping the "in a cpuset" phrase, on an UMA
> desktop PC.

Well the justification for the OOM Killer was that it keeps the system
as a whole alive ..... cpusets are not vital to the system. Usually the
folks using cpusets are very aware of the memory usage of their processes
and I think they would hate that a random process be killed.

The obvious case is that someone starts a process that allocates
large amount of memory while other processes are already running.

> The basic premise of the oom killer is that, instead of blowing
> off the caller, who might innocently have asked for one page
> too many after some other hog used up all the available memory,
> we try to pick the worst offender.
>
> Get the worst offender, not just who ever finally hit the limit.

Typically the worst offender is who just asked for the page.

> I've yet to be convinced that the oom killer is our friend,
> and half of me (not seriously) is almost wishing it were
> gone.

I definitely agree with you. Lets at least keep it from doing harm as much
as possible.

> Would another option be to continue to fine tune the heuristics
> that the oom killer uses to pick its next victim?
>
> What situation did you hit that motivated this change?

mbind to one node. Allocate on that node until you trigger the OOM killer.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/