Re: Avoiding OOM on overcommit...?

From: Marco Colombo (marco@esi.it)
Date: Tue Mar 28 2000 - 14:07:20 EST


On Tue, 28 Mar 2000, Peter T. Breuer wrote:

> > That's simply a non overcommitting approach. And also a fixed-size approach.
> > Of course, it gives you a very predictable behaviour. The price
> > you pay is that system throughput is 1/10 of an overcommitting system
> > on the same HW. Is your shell 'secure'? And every 'ls' you perform?
>
> There is no penalty. What is being reserved is available swap space,
> not ram. You will have the same system operational characteristics as
> at present. It's just that on a machine with up to 256 processes
> available, you will have to provide 2GB of swap (which will never be
> used in normal practice).

So you can run only 256 'ls', while I (on my overcommitting system)
run 25600 of them.

> As to which processes are secure, that's up to you to decide. You are
> being given the choice under this scheme, with guaranteed no OOM
> for your secure processes, and no behavioural penalty. Just 2GB of
> total backing swap required, just in case, and only to be able to give
> you the guarantee you asked for.

2GB reserved *for stack*... you need more for "normal" paging.

>
> > Including RAM as a part of VM is a mistake. Consider a 64MB RAM +
>
> It's possible to provide a similar scheme in which Virt = Ram + Swap1,
> Swap = Swap1 + Swap2, and one maintains sufficient reserve in Swap2.
> I didn't put it up here, largely because I wasn't confident of
> getting it right first time in front of an audience.
>
> > 64 MB swap system.
> >
> > t1) Process A requests 60MB via brk(), the system has 128MB of VM,
>
> It can't, or shouldn't. But I'll give you it as a hypothetical
> possibility. Note that a 60MB stack is very unusual and only root can
> do it. So it's root's shoot yourself in the foot idea, and all
> disclaimers apply.
>
> > so it allocates 60MB to process A. Available VM is 68MB.
>
> OK.
>
> > t2) Process B starts, and requests 60MB. The system has 68MB of VM,
> > so it allocates 60MB to process B. Available VM is now 8MB.
>
> OK.
>
> > t3) Process A writes all pages, so they are now in-core and dirty.
>
> OK.
>
> > t4) Process B writes all pages. The kernel has now to page-out A
> > pages to swap. About 60MB of swap gets used.
>
> Well, yes it has to page out 56MB of A's pages. But that's fine.
                             ^^^^^^
about 60MB == 56MB + (some RAM has to ge used by the system...) B-)

> It has to do that anyway, and it is paging in (notionally) 60MB
> of B's pages.
>
> > B pages are in-core, and dirty.
>
> Yes.
>
> > t5) Process A writes all pages again. The kernel has to page-in A pages,
> > so it has to page-out B before. But there are only 4MB of free swap.
>
> ??. No it has to page out 4MB of A's pages and page in 4MB of B's, etc
> etc. etc., until finished.

What? You're assuming that page-in frees swap? What for? Swap space
remains allocated after a page-in, i think. I hope! I really hope
the kernel DOES NOT allocted / deallocate disk blocks at EVERY page-out/
page-in. It allocates swap only once. Either at brk() or fork() time
(non overcommitting) or at the first page-out (overcommitting).
Swap gets deallocated only when a process exits. That why, in a OOS
situation, the only solution is killing.

OOS should be avoided. Full stop. Moreover, it's likely that you reach
OOS only after the system is completely stuck with thrashing, which in
turn should be avoided. On a correctly configured system, you should never
see OOS. Killing is not a proposed system behaviour for *normal* conditions..

> > OOS. You should check free *swap* space, not free VM.
>
> Why? I don't follow. Is this a performance issue? You had to page out
> A and page in B, and vice versa. It was unavoidable. The limit imposed
> by the accounting I proposed only prevents you launching MORE processes
> now (process C, for example). It doesn't at all affect the behaviour of the
> two processes A and B that you are considering.
>
> > I don't follow BSD any more, but at 4.3 time I'm pretty sure it allocated
> > *swap*. And it checked for available swap before allowing process creation
> > (or grow). In the example above, at t2) the kernel refuses to give 60MB
> > to process B. To run 2 60MB-sized processes you need >120MB of swap, no
> > matter how much RAM you have.
>
> This is a distinct approach. As I said, I wasn't confident of getting
> the accounting rationale correct in public, so I put up a simpler rule,
> which also clearly works, and whose only effect is to tell the system
> when it can launch a process "securely". It doesn't affect the
> processes behaviour if they are launched.

The above example was just to state that you should ignore RAM when you
preallocate. Do all computations over free *swap* space. Do not count
free RAM as free VM.

>
> Peter
>

.TM.

-- 
      ____/  ____/   /
     /      /       /			Marco Colombo
    ___/  ___  /   /		      Technical Manager
   /          /   /			 ESI s.r.l.
 _____/ _____/  _/		       Colombo@ESI.it

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Fri Mar 31 2000 - 21:00:22 EST