kernel panic "killing interrupt handler" and kernel BUG at sched.c:468

From: Federico Sevilla III (jijo@free.net.ph)
Date: Mon Sep 30 2002 - 07:13:24 EST


Hi everyone,

On our server that had been running for 55 days with this 2.4.19-xfs
kernel (XFS CVS snapshot 20020809 patched with RML's preempt patches for
2.4.19-rc3 and sys-magic 20020314 from Randy Dunlap, built using GCC
3.1.1 running Debian GNU/Linux), I hit a kernel panic in the process
running the distributed-net client. I had been running the
distributed-net client -- and everything else on the server -- with no
significant changes recently. The server wasn't under any significantly
different load, either.

I copied the oops by hand onto another computer and am attaching a
ksymoops output of that as kernel-panic.out. I rebooted and a few
minutes after all the initialization had completed I hit another kernel
panic, again because of the distributed-net client process. The oops
(passed through ksymoops) is attached as kernel-panic-2.out.

After copying the oops message, I attempted to sync the disks using the
(Alt + SysRq + S) key combination and after the sync messages I hit a
kernel BUG at sched.c:568. In my sched.c (different from the XFS tree
only because of RML's preempt patch) line 568 is in the "asmlinkage void
schedule(void)" function. The oops (passed through ksymoops) is attached
as kernel-bug.out.

Some other notes that may be significant to mention:

    - system is an Intel Pentium III with 512MB RAM and a 3ware 6400 IDE
      RAID controller,
    - system has one small ext2 partition for /boot, one ReiserFS
      partition for Squid cache, and 5 XFS partitions,
    - system is an NFS server, with NFSv3 enabled in the kernel and
      running nfs-kernel-server 1.0.2,
    - system is not exclusively an NFS server, it's a Samba, mail, IRC
      server as well, and runs lm-sensors,
    - this happened during a lull in the load because everyone was on
      their way home at the end of our work day.

I am recompiling the kernel now, using a current CVS snapshot of the XFS
tree, and using Debian Sid's current default gcc (2.95.4 20011002)
instead of gcc 3.1.1 like before, and without RML's preempt patch (the
SysMagic patch does not touch sched.c and probably didn't have anything
to do with this). I turned off distributed-net as soon as I rebooted
this third time, and the system's alive so far and was able to recompile
the kernel. I will turn it back on when I boot with the new kernel and
will send a follow-up if the kernel panics again.

Pointers as to what probably caused this are welcome. If this is a "new"
issue I hope the decoded oops messages will be help. Thank you everyone
for your time.

 --> Jijo

-- 
Federico Sevilla III   :  http://jijo.free.net.ph
Network Administrator  :  The Leather Collection, Inc.
GnuPG Key ID           :  0x93B746BE





- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Mon Sep 30 2002 - 22:00:45 EST