Re: Avoiding OOM on overcommit...?

From: Khimenko Victor (khim@sch57.msk.ru)
Date: Sun Mar 26 2000 - 15:53:20 EST


In <38DBBA93.B5EF5E66@storm.ca> Sandy Harris (sandy@storm.ca) wrote:
> Alan Cox wrote:
>>
>> > The explanation is not the issue. The issue is having programs that
>> > act correctly (check malloc() returns, don't follow null pointers,
>> > etc.) fail in ways that are quite difficult for their programmers
>> > to prevent.
>>
>> Indeed. And its completely valid to kill them off. Right now nobody
>> has a serious commercial requirement for non-overcommit on Linux.

> I'm seriously confused by this thread.

Why ?

> If I understand correctly, programs can malloc() memory, get back a
> valid pointer, then segfault later when they try to use the memory.

As well as it can segfault trying to use too deep recursive procedure if
system does not have enough resources to expand stack... Or trying to use
memory in some other way...

> This strikes me as quite obviously Wrong.

Why ? It happens when system just DO NOT HAVE RESOURCES to keep all programs
running. Something should give up.

> I can see no question about whether this is a problem. To me, it clearly
> is, and a ghastly one at that. This breaks the basic behaviour of
> important system calls, making memory allocation unreliable.

It does not! ONCE MORE: something bad (innocent program killing or whatever)
can happen ONLY when system do not have resources to continue without task
killing. WHAT task should be killed is another matter. Overcommit does not
affect this. Overcommit can affect task selection in such situation though.

> It is not clear to me why anyone would implement overcommit in the
> first place or, once the obvious problem is pointed out, why there
> should be discussion of anything other than how to fix the blunder.

Since without overcommiting problem will be WORSE, not better! You system
will be able to keep up LESS programs, not more. Yes, it can make decision
of doomed task easier, but that's all.

> On the other hand, I see several people, some of whom I know are
> competent, defending the existing behaviour.

> Methinks not all those folk are clueless. More likely, I've missed
> something basic here. Would someone please explain it?

> Would it make everyone happy to have tunable parameters for this?
> Using R for RAM, S for swap, max Commitable is:

> C = aR + bS

> and sys admin sets a, b. Default is perhaps a = 1, b = .5 to keep
> paging to reasonable levels, but you might have a = b = 10 on one
> system to allow heavy overcommit and a = 1, b = 0 on another for
> minimum swapping.

This is all just stupid and not needed. What IS needed is reasonable tracking
of used memory and user-level (not task level as now) quota. You CAN crash
database with or without overcommit in case of OOM. Just like you can crash
said database if you'll fill all space on disk. Good database (like Oracle,
DB2 or whatever) can be safely restarted again if resurces will be reclaimed.

P.S. IRIX has something akin to proposed scheme. Most admins just use
b=\infty while few (much less) using a=b=1.

P.P.S. Oh, and to implement all this you need additional book-keeping of
used memeory. I doubt Linus will accept such patches: in normal situation
they are not needed at all and it's not clear how they can help in case of
disaster...

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Fri Mar 31 2000 - 21:00:18 EST