Re: what signal is used when overcommit occurs?

From: Jesse Pollard (pollard@tomcat.admin.navo.hpc.mil)
Date: Thu Apr 20 2000 - 07:43:19 EST


Michael Richardson <mcr@solidum.com>:
> I believe that I may have a compute server that is experiencing
> overcommit. I can't be sure, because I don't have top output at the
> exact time that random processes die. I haven't seen anything
> on the console (well, dmesg) about it. The processes die with SIGINT,
> and are often memory hungry. It never seems to happen under the debugger!
>
> From what we can tell, they'd like around 160Mb. We have 256M ram
> and 512Mb swap configured (2.2.12). But, I can't guarantee that there aren't
> several of the running, as this is a multiuser compute server.
>
> These processes use a lot of stack as well, but we've adjusted our
> ulimits, etc. as well and that usually results in SEGV.
>
> To recap:
> 1) what signal is used?

In mm/memory.c it is using SIGKILL (I'm looking at 2.0.33 at the moment,
I don't think it was changed in 2.2. There is a patch for 2.3 that modifies
the "random" behaviour that makes it more deterministic. The function
you may want to look at is "oom".

> 2) is anything logged?

In mm/memory.c, there is an attempt to print "Out of memory for ..." in a
printk. This message never gets to syslogd, and if you don't have the
serial line console there is a finite posibility that you won't see the
message on the console.

One thing you can do to at least watch for the condition is to run
"vmstat 5" and put the results in a disk file. This will log vmstat every
5 seconds and give you some idea of the swap use. If swap is filling up
(the "so" column) then it is very likely that the OOM killer is what is
causing the random aborts. I realize that this process itself may get killed,
but even that should have a log entry within 5-10 seconds of the abort.
-------------------------------------------------------------------------
Jesse I Pollard, II
Email: pollard@navo.hpc.mil

Any opinions expressed are solely my own.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Sun Apr 23 2000 - 21:00:17 EST