Re: 2.6.34-rc1: rcu lockdep bug?

From: AmÃrico Wang
Date: Mon Mar 15 2010 - 05:39:27 EST


2010/3/15 AmÃrico Wang <xiyou.wangcong@xxxxxxxxx>:
> 2010/3/15 AmÃrico Wang <xiyou.wangcong@xxxxxxxxx>:
>> On Sat, Mar 13, 2010 at 01:58:38PM -0800, Paul E. McKenney wrote:
>>>On Sat, Mar 13, 2010 at 01:33:56PM +0800, AmÃrico Wang wrote:
>>>> On Fri, Mar 12, 2010 at 02:37:38PM +0100, Eric Dumazet wrote:
>>>> >Le vendredi 12 mars 2010 Ã 21:11 +0800, AmÃrico Wang a Ãcrit :
>>>> >
>>>> >> Oh, but lockdep complains about rcu_read_lock(), it said
>>>> >> rcu_read_lock() can't be used in softirq context.
>>>> >>
>>>> >> Am I missing something?
>>>> >
>>>> >Well, lockdep might be dumb, I dont know...
>>>> >
>>>> >I suggest you read rcu_read_lock_bh kernel doc :
>>>> >
>>>> >/**
>>>> > * rcu_read_lock_bh - mark the beginning of a softirq-only RCU critical
>>>> >section
>>>> > *
>>>> > * This is equivalent of rcu_read_lock(), but to be used when updates
>>>> > * are being done using call_rcu_bh(). Since call_rcu_bh() callbacks
>>>> > * consider completion of a softirq handler to be a quiescent state,
>>>> > * a process in RCU read-side critical section must be protected by
>>>> > * disabling softirqs. Read-side critical sections in interrupt context
>>>> > * can use just rcu_read_lock().
>>>> > *
>>>> > */
>>>> >
>>>> >
>>>> >Last sentence being perfect :
>>>> >
>>>> >Read-side critical sections in interrupt context
>>>> >can use just rcu_read_lock().
>>>> >
>>>>
>>>> Yeah, right, then it is more likely to be a bug of rcu lockdep.
>>>> Paul is looking at it.
>>>
>>>Except that it seems to be working correctly for me...
>>>
>>
>> Hmm, then I am confused. The only possibility here is that this is
>> a lockdep bug...
>>
>
> I believe so...
>
> Peter, this looks odd:
>
> Âkernel: Â(usbfs_mutex){+.?...}, at: [<ffffffff8146419f>]
> netif_receive_skb+0xe7/0x819
>
> netif_receive_skb() never has a chance to take usbfs_mutex. How can this
> comes out?
>

Ok, I think I found what lockdep really complains about, it is that we took
spin_lock in netpoll_poll_lock() which is in hardirq-enabled environment,
later, we took another spin_lock with spin_lock_irqsave() in netpoll_rx(),
so lockdep thought we broke the locking rule.

I don't know why netpoll_rx() needs irq disabled, it looks like that no one
takes rx_lock in hardirq context. So can we use spin_lock(&rx_lock)
instead? Or am I missing something here? Eric? David?

Thanks!
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/