Re: Some questions about linux kernel.

From: James Sutherland (jas88@cam.ac.uk)
Date: Thu Mar 16 2000 - 07:25:07 EST


On Thu, 16 Mar 2000, Peter T. Breuer wrote:

> "A month of sundays ago James Sutherland wrote:"
> > On Thu, 16 Mar 2000, Peter T. Breuer wrote:
> >
> > > "A month of sundays ago James Sutherland wrote:"
> > > > On Wed, 15 Mar 2000, Sean Hunter wrote:
> > > This is pure nonsense. You don't understand. I have _dozens_ of boxes.
> > >
> > > They work fine without OOM (i.e. in 2.0.*). With OOM they murder
> > > themselves.
> > >
> > > If Rik's patch gets rid of the current OOM behaviour, then I am all in
> > > favour of it. It cannot make things worse. But why not just get rid of
> > > it? Things worked fine in 2.0.*, as far as I can see. I didn't get
> > > cron and init being killed.
> >
> > "Get rid of OOM". Wonderful! Now, if you'll just send us a copy of your
> > patch which gives the kernel infinite memory...
>
> Please read what I said instead of twisting the words! Get rid of the
> current OOM _behaviour_ is what I said. The 2.0.* behaviour was fine.
> 2.2.* and up seem to indulge in random killing for no reason at all,
> which I put down to false positives from a kill-happy something in
> the kernel.

Sorry - I must have misinterpreted your "just get rid of it". Have you
been able to watch this happen (keeping an open "top", for example)??

> My boxes are _not_ out of memory. They all have at least 64M, and all
> have swap of at least 64M too. They all run perfectly normal processes.
> They have been doing this for years. Switching away from 2.0.* kernels
> caused them to start dieing like flies with dead inits, dead crons, dead
> whatevers. Only the lab boxes do this. Personal workstations work
> fine. I presume the error situations are corrected without thinking by
> someone sitting at a terminal who sees a poorly responding netscape, or
> some such, or whose machine loses contact. Left on their own, the
> boxes are brought down by a killer presumably in the kernel,because
> that diagnosis enabled counter measures that have been 99% effective
> (to wit ... maintain key daemons from init, in particular cron, who has
> to check other daemons too ..).

Do you use a software watchdog, too? If you have one of these, set NOT to
disarm when the device is closed, you should be pretty safe - if all your
processes die (including init) the system reboots, if init is still alive
*and functioning* the watchdog will be kept alive too.

> > You CANNOT "get rid of" running out of memory. You have a finite amount of
> > memory, with various processes all taking chunks of it - once it's all
> > gone, you are OOM. Now, how do you "get rid of" the problem? The clone()
> > syscall doesn't work on DRAM...
>
> I don't have a problem. The current OOM behaviour does. If Rik's patch
> mends it, then I am all for it.

Having hit the current behavious last night, I wholeheartedly agree :)
(Being stuck watching my load average explode, and the disks thrash like
an NT box handling a logon, while unable to do anything at all, is not an
experience I would recommend!)

James.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Thu Mar 23 2000 - 21:00:19 EST