Re: 2.4.0-test11pre2-ac1 and previous problem

From: Andrew Morton (andrewm@uow.edu.au)
Date: Mon Nov 13 2000 - 06:16:02 EST


Keith Owens wrote:
>
> On Mon, 13 Nov 2000 09:58:17 +0100,
> Jasper Spaans <jasper@spaans.ds9a.nl> wrote:
> >All right, here's another one, this time using the oops directly from the
> >console -- this seems to give better symbols.. The 'console shuts up ...'
> >works, the oops from the other CPU didn't get put out.
>
> Ohhhh, damn! For NMI lockups we want the console to stay live so NMI
> detection on the other cpus can be printed. NMI is normally caused by
> spinlock problems and it is useful to know what the other cpus are
> doing. Andrew, do you want to have a go at fixing this?

Uh, sure - I just _love_ running fsck :) I'm working on this stuff
at present. That wake_up in printk() is baaaaad...

> >Will try test11-pre3 + kdb this afternoon, if it compiles.
>
> Patch kdb-v1.5-2.4.0-test11-pre3.gz should be OK.

It would be very, very interesting to see where the other CPU is.

I can see one bug from Jasper's trace: setscheduler() does:

        spin_lock_irq(&runqueue_lock);
        read_lock(&tasklist_lock);

whereas the exit_notify->do_notify_parent->send_sig_info->wake_up_process
path does:

        write_lock_irq(&tasklist_lock);
        spin_lock_irqsave(&runqueue_lock, flags);

Death by double deadlock. But I doubt if setscheduler() is the
source - who ever calls that?

The correct locking hierarchy is, I think:

        spin_lock(runqueue_lock)
        read/write_lock(tasklist_lock)
        read/write_unlock(tasklist_lock)
        spin_unlock(runqueue_lock)

Jasper, as a random stab in the dark you may care to try this:

--- linux-2.4.0-test11-pre4/kernel/exit.c Sun Oct 15 01:27:46 2000
+++ linux-akpm/kernel/exit.c Mon Nov 13 22:05:37 2000
@@ -381,8 +381,10 @@
          * jobs, send them a SIGHUP and then a SIGCONT. (POSIX 3.2.2.2)
          */
 
- write_lock_irq(&tasklist_lock);
+ read_lock_irq(&tasklist_lock);
         do_notify_parent(current, current->exit_signal);
+ read_unlock_irq(&tasklist_lock);
+ write_lock_irq(&tasklist_lock);
         while (current->p_cptr != NULL) {
                 p = current->p_cptr;
                 current->p_cptr = p->p_osptr;
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Wed Nov 15 2000 - 21:00:23 EST