Re: Avoiding OOM on overcommit...?

From: Andreas Dilger (adilger@home.com)
Date: Fri Mar 24 2000 - 18:58:08 EST


Olaf Weber writes:
> In a similar vein, if a process dies due to SIGXCPU or SIGXFSZ that
> you know it dies because it ran into a cpu or file limit. Yes, I
> would like to have some OS support from an overcommitting OS, so that
> if it decides to kill a process of mine because it ran out memory I
> get SIGXMEM instead of SIGBUS.

If you look at the AIX documentation, they have a signal like this. The
following is quoted from the following URL, and I urge people posting to
this OOM thread to read this document to get a better understanding of
what a system with (optional) no-overcommit behaves like.

http://www.rs6000.ibm.com/doc_link/en_US/a_doc_lib/aixbman/admnconc/pag_space_under.htm

   "The system monitors the number of free paging space blocks and
   detects when a paging-space shortage exists. When the number of free
   paging-space blocks falls below a threshold known as the paging-space
   warning level, the system informs all processes (except kprocs)
   of this condition by sending the SIGDANGER signal. If the shortage
   continues and falls below a second threshold known as the paging-space
   kill level, the system sends the SIGKILL signal to processes that are
   the major users of paging space and that do not have a signal handler
   for the SIGDANGER signal (the default action for the SIGDANGER signal
   is to ignore the signal). The system continues sending SIGKILL signals
   until the number of free paging-space blocks is above the paging-space
   kill level."

Those advocates of "no OOM possible" implementations should read the part:

   "The AIXwindows server currently requires more than 250MB of paging space
   when the application runs in early allocation mode."

It is interesting to note that memory overcommit has recently been ADDED
to AIX, arguably one of the most popular MISSION CRITICAL BUSINESS Unixes.
It is possible, however to set "early allocation mode" for some programs
so that they will always have enough swap space, and will never be killed
by OOM. This requires that VM size = swap size, rather than the Linux
VM size = swap size + RAM size.

I think if Linux has a "pre-KILL" signal like SIGDANGER, which lets programs
know that they will be killed if some memory isn't freed soon, all will be
well (IMHO). If a program is important enough to survive overcommit (i.e.
Apache, system daemons, etc) they can handle SIGDANGER, and can free up some
memory (assuming free() returns memory to the system), or run with "early
allocation" so that they will not be the programs to be killed when OOM
happens.

Cheers, Andreas

-- 
Andreas Dilger  \ "If a man ate a pound of pasta and a pound of antipasto,
                 \  would they cancel out, leaving him still hungry?"
http://www-mddsp.enel.ucalgary.ca/People/adilger/               -- Dogbert

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Fri Mar 31 2000 - 21:00:14 EST