Panic on all memory used?

From: Botha, Francois (francoisb@thawte.com)
Date: Tue May 06 2003 - 07:24:01 EST


Hi,

I'm sure some of us has come across a out-of-memory Linux box, where all
RAM and Swap has been used and the machine just grinds to a halt, but no
full crash (it ping replies!). Something like:

--
 16:03:36 up 21 days, 49 min,  2 users,  load average: 304.70, 264.89,
157.60
 392 processes: 82 sleeping, 307 running, 3 zombie, 0 stopped
 CPU states:   0.0% user,  99.9% system,   0.0% nice,   0.1% idle
 Mem:   3624196K total,  3618928K used,     5268K free,     4792K
buffers
 Swap:  2000084K total,  2000084K used,        0K free,    17312K cached

PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME COMMAND 21579 www-data 20 0 4116 3932 2000 R 3.6 0.1 0:06 apache 21576 www-data 20 0 3576 3392 1980 R 3.5 0.0 0:06 apache --

At this point a machine tries to kill off processes (based on what criteria?) but most of the time such a machine rarely "recovers" and becomes usable again. Would it be very wrong to suggest that maybe a kernel panic at this point might be a good idea? The saviour could then be /proc/sys/kernel/panic :P (above example is a paste from a 2.4.20 system).

Don't get me wrong though, yes, processes should clean up after themselves, this kind of thing "should" never happen, but what to do when it does under whichever circumstances and the server in question is co-located and involves lengthy phonecalls? Does anybody have any suggestions kernel-wise in sitsuations like this in production environments?

Regards, Francois Botha

Snr. Systems Engineer e-mail: francoisb@thawte.com http://www.thawte.com


- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Wed May 07 2003 - 22:00:25 EST