Re: Some questions about linux kernel.

From: Peter T. Breuer (ptb@it.uc3m.es)
Date: Tue Mar 14 2000 - 10:32:54 EST


"A month of sundays ago Jason Gunthorpe wrote:"
> On Mon, 13 Mar 2000, James Sutherland wrote:
> > I want to emphasise this point: the OOM handler should **NEVER** be
> > invoked. If it is ever needed, there is something BADLY wrong with the
> > system - either we do something drastic (process slaughtering) or the
> > machine goes down completely. Worrying about which user processes are
>
> Just as someone who has been bitten by this, I agree with James. The
> number one priority is keeping the machine up - I don't care what you
> kill, just keep it running and preferably allow remote logins still...

The problem is that it is the process killer that is killing the
system, nothing else.

The usual scenarios here are:

a) httpd goes nuts because of some random bug and spawns 250 copies
   of itself and sits there humming. System begins to drag, sendmail
   begins to have trouble, connects start timing out leading to more
   sendmail reconnects and more sendmail processes. Procmail can't
   launch to do local deliver modes and sendmail throttling kicks
   in producing bursts of process creation, followed by procmail
   waiting forever on a single lock. Meanwhile nfs exports become
   clogged and the system begins to labour ....

       Cure: knock out httpd.

b) Someone leaves netscape running unattached. It tears up the network
   connection somehow. Networking begins to die, daemons like nfsd
   and named start to loop internally, memory drops, error messages
   spawned and can't be sent. Process killer kicks in. Kills cron,
   nfsd or mountd, inetd or init (in order of probability, as far as
   I can make out). System dead, won't reboot or take logins.

      Cure: stop the process killer. Things will just about work
      forever in this state. Not well, but forever.

Peter

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Wed Mar 15 2000 - 21:00:28 EST