Re: Overcomittable memory

From: James Sutherland (jas88@cam.ac.uk)
Date: Tue Mar 21 2000 - 07:07:50 EST


On Tue, 21 Mar 2000 11:38:20 +0000, you wrote:
>James Sutherland wrote:
>> >Blindingly simple example app:
>> >
>> >1) syscall - sys_overcommit(off) == this process will not overcommit
>>
>> No, that's NOT what that syscall does (despite the name...)
>>
>> In fact, setting it >0 simply disables ALL sanity checking on malloc.
>> You want 2Gb on a 4Mb 386, malloc() succeeds. A fairly specialist
>> setting, really.
>
>You seem to be misunderstanding what I mean. sys_overcommit(off) here is
>an imaginary syscall which disables overcommit completely on a
>per-process basis.

Right... What (fixed) size do you set the stack to?

>> >2) Allocate 512MB of memory with brk(). May fail if this can't be 100%
>> >guaranteed. No problem, the user just ran it all of a second ago.
>>
>> This does NOT depend on (and indeed is not affected by) overcommit,
>> provided you touch the memory.
>
>See above.

So? If you touch the memory, overcommit doesn't come into play.

>> >3) mmap() a result file, say 1GB size. Map flags are MAP_SHARED,
>> >PROT_WRITE. May fail if the kernel cannot allocate structures. Ditto, no
>> >problem.
>> >
>> >4) Main loop, never allocate any memory during runtime. Use the 512MB as
>> >a heap. Or better yet, don't do any dynamic allocation. In fact, don't
>> >do any syscalls at all. Run here for a few weeks...
>>
>> Just run for a few weeks without logging or journalling anything. Hrm.
>> Bright design. Supposing there is a power failure or system crash
>> during this time?
>
>Don't be so pedantic. It can write to the result file at regular
>intervals using msync(). It's only a bloody example.

In which case, terminating it abruptly doesn't cause any significant
data loss after all.

>> >5) Write results into mapped result file. msync()
>> >
>> >Would you not agree that once stage 4 is reached the program CANNOT
>> >POSSIBLY DIE for any reason whatsoever?
>>
>> Nope. I just poured coffee over the CPU. Only Nortel and Tandem
>> hardware handles that gracefully :-)
>
>See above.

The point is, your process CAN die for a whole variety of reasons,
including the sysadmin deciding you are hogging resources and kill9ing
you.

>> > That is, so long as the 512MB
>> >requested by brk() are reserved. No copy, fill or faulting required.
>> >Overcommit apps can happily allocate loads of memory on top of this, but
>> >they'll just die when memory is depleted to what was reserved.
>>
>> This is true with or without overcommit - they will just die
>> earlier/more often without it.
>
>Ok, I'm shocked how you can come to that conclusion from the example I
>gave. HOW can that example program die, apart from hardware failure as
>you so pedantically point out?

Disabling overcommit makes every process take up a bit more memory, so
you run out much sooner. Then things start dying. If you aren't making
ANY syscalls, then you are OK (provided you aren't killed for resource
hogging etc.) - but that was true anyway, unless the machine is
overloaded.

>> >It's the ability to request non-overcommit that's essential, and I don't
>> >see what you have against it. With the current scheme of things this is
>> >impossible.
>>
>> No it isn't. Just touch the memory on allocation.
>
>Inefficient, and pointless if you just have non-overcommit. Not to
>mention it doesn't work. See below.

It is not inefficient; if you have disabled overcommit, you are just
performing an extra 100k instructions. 1ms of execution at
initialisation. And it DOES work.

>> >(Yes, msync() can fail, but that's "easily" fixable with reserved pages)
>>
>> All in all, a very sloppy design. If you look at sensible number
>> crunching apps (Distributed.net and SETI@home are common examples)
>> they checkpoint themselves frequently. Any serious app should do this;
>> if it doesn't, that's just poor programming on your part.
>
>It's only example. If you want to be that pedantic about it, then make
>it write results out every now and then. Probably one extra line of
>code.

At which point, it DOES handle being killed gracefully - so there's no
problem anyway.

>> If you want non-overcommitted memory, just touch it on allocation.
>> That's it. No need to mangle the kernel or anything else.
>
>Does not work. Your code pages (read-only, shared mapping) can get
>paged, and when they're hit again, you die.

No, because they have already been allocated. You can only die when
you access UNallocated pages, thus trying to allocate them.

> You can make sure nothing
>gets paged by using mlockall(MCL_CURRENT|MCL_FUTURE) but that has two
>problems:
>
>1) The free pages count is incorrect when there are any shared read-only
>mappings. This is a bug.

>2) When mlockall'd you have the nasty side-effect of having all mappings
>faulted in whether you want it or not - even if they're mapped with
>PROT_NONE. Not necessarily a bug, but not pleasant either.
>
>Again, why are you so opposed (e.g the most frequent poster on this
>topic) to non-overcommit? Seeing as you've made it clear you'd never use
>it, why are you trying to stop us using it?

I am trying to point out why it doesn't achieve what you think it
does, and it comes at much greater expense than you suggest.

James.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Thu Mar 23 2000 - 21:00:32 EST