Re: [patch 00/61] ANNOUNCE: lock validator -V1

From: Benoit Boissinot
Date: Tue May 30 2006 - 07:41:34 EST


On Tue, May 30, 2006 at 12:26:27PM +0200, Arjan van de Ven wrote:
> On Tue, 2006-05-30 at 11:14 +0200, Benoit Boissinot wrote:
> > On 5/29/06, Ingo Molnar <mingo@xxxxxxx> wrote:
> > > We are pleased to announce the first release of the "lock dependency
> > > correctness validator" kernel debugging feature, which can be downloaded
> > > from:
> > >
> > > http://redhat.com/~mingo/lockdep-patches/
> > > [snip]
> >
> > I get this right after ipw2200 is loaded (it is quite verbose, I
> > probably shoudln't post everything...)
> >
> > ipw2200: Detected Intel PRO/Wireless 2200BG Network Connection
> > ipw2200: Detected geography ZZD (13 802.11bg channels, 0 802.11a channels)
>
>
> > <c0301efa> netlink_broadcast+0x7a/0x360
>
> this isn't allow to be called from IRQ context, because it takes
> nl_table_lock for read, but that is taken as
> write_lock_bh(&nl_table_lock);
> in
> static void netlink_table_grab(void)
> so without disabling interrupts; which would thus deadlock if this
> read_lock-from-irq would hit.
>
> > <c02fb6a4> wireless_send_event+0x304/0x340
> > <e1cf8e11> ipw_rx+0x1371/0x1bb0 [ipw2200]
> > <e1cfe6ac> ipw_irq_tasklet+0x13c/0x500 [ipw2200]
> > <c0121ea0> tasklet_action+0x40/0x90
>
> but it's more complex than that, since we ARE in BH context.
> The complexity comes from us holding &priv->lock, which is
> used in hard irq context.

It is probably related, but I got this in my log too:

BUG: warning at kernel/softirq.c:86/local_bh_disable()
<c010402d> show_trace+0xd/0x10 <c0104687> dump_stack+0x17/0x20
<c0121fdc> local_bh_disable+0x5c/0x70 <c03520f1> _read_lock_bh+0x11/0x30
<c02e8dce> sock_def_readable+0x1e/0x80 <c0302130> netlink_broadcast+0x2b0/0x360
<c02fb6a4> wireless_send_event+0x304/0x340 <e1cf8e11> ipw_rx+0x1371/0x1bb0 [ipw2200]
<e1cfe6ac> ipw_irq_tasklet+0x13c/0x500 [ipw2200] <c0121ea0> tasklet_action+0x40/0x90
<c01223b4> __do_softirq+0x54/0xc0 <c01056bb> do_softirq+0x5b/0xf0
=======================
<c0122455> irq_exit+0x35/0x40 <c01057c7> do_IRQ+0x77/0xc0
<c0103949> common_interrupt+0x25/0x2c

>
> so the deadlock is like this:
>
>
> cpu 0: user context cpu1: softirq context
> netlink_table_grab takes nl_table_lock as take priv->lock in ipw_irq_tasklet
> write_lock_bh, but leaves irqs enabled
>
>
> hardirq comes in and the isr tries to take in ipw_rx, call wireless_send_event which
> priv->lock but has to wait on cpu 1 tries to take nl_table_lock for read
> but has to wait for cpu0
>
> and... kaboom kabang deadlock :)
>
>

--
powered by bash/screen/(urxvt/fvwm|linux-console)/gentoo/gnu/linux OS
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/