Re: lockups with 2.4.20 (tg3? net/core/dev.c|deliver_to_old_ones)

From: Denis Vlasenko (vda@port.imtp.ilyichevsk.odessa.ua)
Date: Wed Feb 26 2003 - 07:45:16 EST


On 26 February 2003 09:00, Rhodes, Tom wrote:
> >> Since sometime in December two systems we have on site using P4 HT
>
> (one
>
> >> Dell 2650 and one Dell 4600, both dual CPU, both ht/mce capable)
> >> have
>
> been
>
> >> locking up without any kernel output and without sysrq keys
> >> working
>
> (the
>
> >> keyboard is locked solid).
> >>[...]
> >> Using nmi_watchdog I've managed to get a stack track and ran
> >> ksymoops
>
> over
>
> >> it (attached).
> >
> > Good report. To tell the truth, I know that this lockup exists,
> > there's an RH issue-tracker item against me on this.
>
> Several of us at HP have been chasing this problem as well. Here is
> why there is a deadlock: deliver_to_old_ones()attempts to stop all
> timers from running and then blocks until all the timers are no
> longer running.

Here, linux/interrupt.h:

static inline void tasklet_unlock_wait(struct tasklet_struct *t)
{
        while (test_bit(TASKLET_STATE_RUN, &(t)->state)) { barrier(); }
}

Yes this can run forever

> This code is called from netif_receive_skb which is
> called from tg3_poll while it is holding a lock in the tg3 driver. On
> another CPU, the tg3_timer routine is run but is blocked by the lock
> held in the tg3_poll routine. The tg3_timer routine never finishes
> because it can't acquire the lock being held by tg3_poll on another
> CPU. That prevents deliver_to_old_ones from executing because there
> is still a timer routine executing.
>
> Here is the call stack of the deadlocked CPUs on a RH8.0 system with
> a 2.4.18-24.8.0 smp kernel:
> CPU 2:
> deliver_to_old_ones+45
> netif_receive_skb
> tg3_rx+27b
> tg3_poll+81
> net_rx_action
> do_softirq
> do_IRQ
> call_do_IRQ
>
> CPU 6:
> tg3_timer (tg3+9fc4)
> run_timer_list+0x112
> bh_action+55
> tasklet_hi_action+67
> do_softirq+d9
> do_IRQ
> call_do_IRQ+5

--
vda
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Fri Feb 28 2003 - 22:00:35 EST