Re: Avoiding OOM on overcommit...?

From: James Sutherland (jas88@cam.ac.uk)
Date: Fri Mar 24 2000 - 18:50:55 EST


On Fri, 24 Mar 2000 11:24:41 -0500, you wrote:
>James Sutherland wrote:
>
>> >BUT "when I use up the resources given to me" - If the resources
>> >weren't available, why did the system give them to me?
>>
>> It didn't.
>
>If he did a [mc]alloc to ALLOCate the the memory and got a non-null
>return, it certainly said it did.

No - malloc() doesn't allocate MEMORY, it allocates ADDRESS SPACE to
put memory in later. This may not be what you wanted, but that's what
it does.

>> >> >Who gets killed - your process or mine?
>> >> Yours, because there aren't enough resources to run it.
>> >
>> >The system told me there were enough resources.
>>
>> You asked it if there was 128Mb of VM free; there was.
>
>If I used [mc]alloc(), I wasn't asking about free resources. I was
>asking for ALLOCation of memory.

No, you were actually asking for address space to put memory in.
That's where the problem arises.

>> Half an hour later, you try to use 128Mb of VM and fail.
>
>If a process can segfault trying to use memory it has succeeded in
>allocating, then the system has made a ghastly error.

It hasn't allocated any memory. It has allocated address space, and
indicated that AT THE TIME you COULD have filled it. You chose not to,
and when you then changed your mind, it was too late.

>> There is a rather simpler explanation than the nasty kernel having
>> lied to you...
>
>The explanation is not the issue. The issue is having programs that
>act correctly (check malloc() returns, don't follow null pointers,
>etc.) fail in ways that are quite difficult for their programmers
>to prevent.

No - they fail as a result of a system problem. They will also fail if
I unplug the machine, pour coffee over the CPU, type in "shutdown -r
now" at the console as root....

>For example, this implies one should not run a production database on
>a multi-user Linux machine since it could crash in non-recoverable
>ways.

If your database is so broken it can't handle being terminated by a
signal, you need a new database. There is certainly nothing
"non-recoverable" about it being terminated unexpectedly!

James.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Fri Mar 31 2000 - 21:00:14 EST