"almost" a server crash: syslog & keventd involved

From: Silviu Marin-Caea (silviu@genesys.ro)
Date: Wed Aug 07 2002 - 03:31:57 EST

This morning I found the mail server (SuSE 8 all updates) in a weird
state. Mail wasn't working, DNS wasn't working, couldn't ssh into it.
Luckily, I had a console open from yesterday and was able to ps -Al.

Here's what I've got:

After killing the cron and pop3

Also, load average was 2.

Notice the syslogd line in "ps -l", when in weird state

006 D 0 519 1 0 80 0 - 352 flush_ ? 00:03:46 syslogd
Same with ps -f:
root 519 0.0 0.0 1408 312 ? D Jul15 3:46 /sbin/syslogd

and in normal state

006 S 0 523 1 0 80 0 - 352 do_sel ? 00:00:01 syslogd

man ps says
        D uninterruptible sleep (usually IO)

There is this too:
007 Z 0 2 1 0 80 0 - 0 do_exi ? 00:00:00
keventd <defunct>
with ps -f
root 2 0.0 0.0 0 0 ? Z Jul15 0:00 [keventd

Couldn't get it out of the weirdness without a reboot (ugh). Now, after
the crisis, I think that maybe I should've restarted syslog. And,
besides, the deep magic of keventd is beyond me, just yet.

Considering that daily cronjobs run at 0:15, I think the problem was
there. This is the last file in /var/log: "Wed Aug 07 00:17:21 2002
mail-20020807.gz" and this is the last message in /var/log/messages "Aug
  7 00:17:06 mail named[24944]: lame server"

The conclusion is that syslogd restart after running daily jobs did not
go well. What could have caused this? And what about keventd?

I was planning on going on vacation knowing that I have a reliable linux
server (it's been running for a couple of months). Now there is doubt :-(

Silviu Marin-Caea   Systems Engineer Linux/Unix   http://www.genesys.ro
Phone +40723-267961

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/

This archive was generated by hypermail 2b29 : Wed Aug 07 2002 - 22:00:35 EST