Re: lockups with 2.4.20 (tg3? net/core/dev.c|deliver_to_old_ones)

From: Rhodes, Tom (tom.rhodes@hp.com)
Date: Wed Feb 26 2003 - 02:00:07 EST


>> Since sometime in December two systems we have on site using P4 HT
(one
>> Dell 2650 and one Dell 4600, both dual CPU, both ht/mce capable) have
been
>> locking up without any kernel output and without sysrq keys working
(the
>> keyboard is locked solid).
>>[...]
>> Using nmi_watchdog I've managed to get a stack track and ran ksymoops
over
>> it (attached).

> Good report. To tell the truth, I know that this lockup exists,
> there's an RH issue-tracker item against me on this.

Several of us at HP have been chasing this problem as well. Here is why
there is a deadlock: deliver_to_old_ones()attempts to stop all timers
from running and then blocks until all the timers are no longer running.
This code is called from netif_receive_skb which is called from tg3_poll
while it is holding a lock in the tg3 driver. On another CPU, the
tg3_timer routine is run but is blocked by the lock held in the tg3_poll
routine. The tg3_timer routine never finishes because it can't acquire
the lock being held by tg3_poll on another CPU. That prevents
deliver_to_old_ones from executing because there is still a timer
routine executing.

Here is the call stack of the deadlocked CPUs on a RH8.0 system with a
2.4.18-24.8.0 smp kernel:
CPU 2:
deliver_to_old_ones+45
netif_receive_skb
tg3_rx+27b
tg3_poll+81
net_rx_action
do_softirq
do_IRQ
call_do_IRQ

CPU 6:
tg3_timer (tg3+9fc4)
run_timer_list+0x112
bh_action+55
tasklet_hi_action+67
do_softirq+d9
do_IRQ
call_do_IRQ+5

Thanks
Tom Rhodes
tom.rhodes@hp.com
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Fri Feb 28 2003 - 22:00:34 EST