Re: Avoiding OOM on overcommit...?

From: Marco Colombo (marco@esi.it)
Date: Fri Mar 31 2000 - 04:17:03 EST

Next message: Adrian Bridgett: "af_irda SO_* undeclared (2.3.99-pre4-2)"
Previous message: Tom Holroyd: "Re: 2.3.99-pre3 NFS unmount/mount kmem_create error"
In reply to: Linda Walsh: "Re: Avoiding OOM on overcommit...?"
Next in thread: Rik van Riel: "Re: Avoiding OOM on overcommit...?"
Reply: Rik van Riel: "Re: Avoiding OOM on overcommit...?"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Thu, 30 Mar 2000, Linda Walsh wrote:

> Marco Colombo wrote:
> > So you can run only 256 'ls', while I (on my overcommitting system)
> > run 25600 of them.
> ---
> *and*? Is this a problem? Remember -- we are talking about what support
> the kernel gives to allowing you to configure as you wish. With the same kernel setup,
> I can run 256 'ls', you can allocate your 1T of *virtual* swap (swap that exists only
> for 'accounting' but has no physical backing -- allowing a user to define how much
> overcommit they want to allow on their system).

Ok. I see.
I'm missing something. How you set a process as being 'secure'? I mean,
maybe I want my shell secure (don't want it to be killed). How about
processes forked by it? Are they 'secure', too?

And, most of all, if my shell is 'secure', and the system is OOM, can
I issue commands? Does the shell need to malloc()? Or does the fork()
succeed?

Having a configurable overcommittable additional swap space (if i
understand well) is just fine. But it's a system-wide decision.
Even if you can turn it off for some processes, i think the processes
themselves should not be aware of it. So the shell above won't be
that useful in a OOM situation (a 'secure-mode' aware shell could have
most of the useful commands builtin, with pre-allocated buffers, so
it can list processes, kill them, and the like).

I agree that on a dedicated server (such a Web server, no user accounts,
few system activity, ...), controlling overcommitting is useful.

> > What? You're assuming that page-in frees swap? What for? Swap space
> > remains allocated after a page-in, i think. I hope! I really hope
> > the kernel DOES NOT allocted / deallocate disk blocks at EVERY page-out/
> > page-in. It allocates swap only once. Either at brk() or fork() time
> > (non overcommitting) or at the first page-out (overcommitting).
> > Swap gets deallocated only when a process exits. That why, in a OOS
> > situation, the only solution is killing.
> ---
> There are two swap model systems that I'm aware of. One, an older model that I
> don't see so often is to have each physical page in memory backed by a physical page on
> disk. This was how SunOS functioned back in the 3.x days (circa 1989). I always
> thought that was silly -- but if you had 48 Meg of memory, you had to have 48 Meg of
> swap -- and you could only run a total of 48 Meg of stuff (OS included).

Right. I think this was a requirement for 'swapping' (as opposed to
'paging'). The only way to make room (RAM) for processes was to take one of
them *completely* to disk.

> In IRIX -- as well as in Linux (correct me if I'm wrong), memory on disk is
> *added* memory space in which to run programs. It's a 'virtual memory' size = physical
> + swap. If you only have 64Mb of physical memory and you have two 60 meg processes, you'd
> better hope the kernel only takes 8 Meg (assume the backing of 64M you mentioned).
>
> When a page is read in off disk, the swap page should be marked as 'free'. If no writes

In Linux, i think the page is left on swap, and swap is not marked as 'free'
(or 'available'). Only if you *write* the page (changing the contents), the
kernel marks the swap space as 'free' because its contents are out-of-date...

> occur to the page in memory and the page needs to be paged out, ideally it could first
> be checked if it was on the 'free' disk-swap list and hadn't been recycled yet. If so
> you can simply reclaim it and not even write the clean page to swap. If the in-memory

That's the idea. But if the page is just read, you should not free the
space on swap because it contains real data.

> page is dirty, you have to do a disk write -- you can still reclaim the same disk-page
> if available, otherwise, off of the 'unused' queue, if that is empty then claim on
> a page that would have been reclaimable by another in-memory page.

Whether you write on the same swap space or not, it doesn't really matter.

I was wrong on this. Allocating swap space on *every* page-out is fine,
since such a page-out implies a I/O operation, so is slow. Allocation
can be done in few cpu cycles.

> Anyway -- not sure how much of that is done in Linux -- but I'm pretty sure --
> you read a page in off disk, then it's location is in memory and the swap page is freed.
>
> You are still going to have mondo-page swapping (thrashing) activity if both of
> your 60M processes touch all their pages during their run cycle.
>
>
> > The above example was just to state that you should ignore RAM when you
> > preallocate. Do all computations over free *swap* space. Do not count
> > free RAM as free VM.
> ---
> But it *is* free VM. Suppose I have a 1G of physical mem and zero swap. If
> I 'preallocate', it will only be out of RAM -- since I have no swap.

Uh. You're right. I can think of two different "models".

You know you need RAM. E.g. a dedicated proxy server. You don't want
paging activity on that, for performance reasons. So you run it
with little or no swap. Here your VM is just RAM.

On the other side, you know most of the processes you're going to run
have a large address spaces but a small working-sets. Here your VM is
mostly swap, so suppose you have 64MB of RAM and 256MB of swap.
On sych a system, RAM is used always as a kind of cache, for process pages.
(Of course, it's used as a cache for FS blocks, too).
If you let processes allocate more that 256MB of VM, you're reducing
the cache size, and effectiveness. On such a system it makes sense
NOT to include RAM in the VM counts (just like you don't include RAM
in counts for available disk space...).

>
> --
> Linda A Walsh | Trust Technology, Core Linux, SGI
> law@sgi.com | Voice: (650) 933-5338
>

.TM.

-- ____/ ____/ / / / / Marco Colombo ___/ ___ / / Technical Manager / / / ESI s.r.l. _____/ _____/ _/ Colombo@ESI.it

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/

Next message: Adrian Bridgett: "af_irda SO_* undeclared (2.3.99-pre4-2)"
Previous message: Tom Holroyd: "Re: 2.3.99-pre3 NFS unmount/mount kmem_create error"
In reply to: Linda Walsh: "Re: Avoiding OOM on overcommit...?"
Next in thread: Rik van Riel: "Re: Avoiding OOM on overcommit...?"
Reply: Rik van Riel: "Re: Avoiding OOM on overcommit...?"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

This archive was generated by hypermail 2b29 : Fri Mar 31 2000 - 21:00:29 EST