Re: deadlock in synchronize_srcu() in debugfs?

From: Johannes Berg
Date: Thu Mar 23 2017 - 11:47:11 EST


Hi,

> Not yet. How reproducible is this?

Apparently quite. I haven't tried myself - it happens during some
automated test that I need to analyse further.

> > We're observing that with our (backported, but very recent) driver
> > against 4.9 (and 4.10, I think),
>
> Do I understand it correctly that this driver has been backported
> from 4.11-rcX to 4.9/10

Yes.

> and that there isn't any issue with 4.11-rcX?

No, I can't say this, we haven't run that test.

> > but there are no backports of any debugfs things so the backport
> > itself doesn't seem like a likely problem.
>
> Right, there haven't been any SRCU related changes to debugfs after
> 4.8.

Right.

> > sysrq-w shows a lot of tasks blocked on various locks (e.g. RTNL),
> > but
> > the ultimate problem is the wireless stack getting blocked on
> > debugfs_remove_recursive(), in __synchronize_srcu(), in
> > wait_for_completion() (while holding lots of locks, hence the other
> > tasks getting stuck).
>
> Could you share a complete backtrace? For example, is the
> debugfs_remove_recursive() called from any debugfs file's fops and
> thus, possibly from within a SRCU read side critical section?

No, it's called from netlink:

[ÂÂ884.634857] wpa_supplicantÂÂDÂÂÂÂ0ÂÂ1769ÂÂÂ1005 0x00000000
[ÂÂ884.634874]ÂÂ0000000000000000 ffff8ca50633d140 ffff8ca507b219c0 ffff8ca5455d4cc0
[ÂÂ884.634898]ÂÂffff8ca54f599d98 ffff97df431c36a0 ffffffff878dadf3 ffff8ca500000001
[ÂÂ884.634927]ÂÂ81ed67337c8469e4 ffff8ca54f599d98 0000932a07b219c0 ffff8ca507b219c0
[ÂÂ884.634952] Call Trace:
[ÂÂ884.634969]ÂÂ[<ffffffff878dadf3>] ? __schedule+0x303/0xb00
[ÂÂ884.634985]ÂÂ[<ffffffff878db62d>] schedule+0x3d/0x90
[ÂÂ884.635002]ÂÂ[<ffffffff878e022c>] schedule_timeout+0x2fc/0x600
[ÂÂ884.635021]ÂÂ[<ffffffff870e8b06>] ? mark_held_locks+0x66/0x90
[ÂÂ884.635041]ÂÂ[<ffffffff878e16bc>] ? _raw_spin_unlock_irq+0x2c/0x40
[ÂÂ884.635059]ÂÂ[<ffffffff878dc8cc>] wait_for_completion+0xdc/0x110
[ÂÂ884.635073]ÂÂ[<ffffffff870bff90>] ? wake_up_q+0x80/0x80
[ÂÂ884.635091]ÂÂ[<ffffffff8710a46e>] __synchronize_srcu+0x11e/0x1c0
[ÂÂ884.635109]ÂÂ[<ffffffff87109510>] ? trace_raw_output_rcu_utilization+0x60/0x60
[ÂÂ884.635131]ÂÂ[<ffffffff8710a542>] synchronize_srcu+0x32/0x40
[ÂÂ884.635145]ÂÂ[<ffffffff873899ed>] debugfs_remove_recursive+0x17d/0x190
[ÂÂ884.635239]ÂÂ[<ffffffffc087b3be>] ieee80211_debugfs_key_remove+0x1e/0x30 [mac80211]
[ÂÂ884.635333]ÂÂ[<ffffffffc0840773>] __ieee80211_key_destroy+0x1b3/0x480 [mac80211]
[ÂÂ884.635440]ÂÂ[<ffffffffc0841807>] ieee80211_free_sta_keys+0x117/0x170 [mac80211]
[ÂÂ884.635524]ÂÂ[<ffffffffc0807b0c>] __sta_info_destroy_part2+0x4c/0x200 [mac80211]
[ÂÂ884.635597]ÂÂ[<ffffffffc0807fbd>] __sta_info_flush+0x10d/0x1a0 [mac80211]
[ÂÂ884.635706]ÂÂ[<ffffffffc086634b>] ieee80211_set_disassoc+0xcb/0x530 [mac80211]
[ÂÂ884.635802]ÂÂ[<ffffffffc086e3b6>] ieee80211_mgd_deauth+0x2e6/0x7b0 [mac80211]
[ÂÂ884.635901]ÂÂ[<ffffffffc08237c8>] ieee80211_deauth+0x18/0x20 [mac80211]
[ÂÂ884.636024]ÂÂ[<ffffffffc0673e8f>] cfg80211_mlme_deauth+0x14f/0x3b0 [cfg80211]
[ÂÂ884.636110]ÂÂ[<ffffffffc0649265>] nl80211_deauthenticate+0xe5/0x130 [cfg80211]
[ÂÂ884.636133]ÂÂ[<ffffffff877dc52c>] genl_family_rcv_msg+0x1bc/0x370
[ÂÂ884.636151]ÂÂ[<ffffffff877dc6e0>] ? genl_family_rcv_msg+0x370/0x370
[ÂÂ884.636262]ÂÂ[<ffffffff877dc760>] genl_rcv_msg+0x80/0xc0
[ÂÂ884.636275]ÂÂ[<ffffffff877dba87>] netlink_rcv_skb+0xa7/0xc0
[ÂÂ884.636289]ÂÂ[<ffffffff877dc148>] genl_rcv+0x28/0x40
[ÂÂ884.636303]ÂÂ[<ffffffff877db45b>] netlink_unicast+0x15b/0x210
[ÂÂ884.636318]ÂÂ[<ffffffff877db82a>] netlink_sendmsg+0x31a/0x3a0
[ÂÂ884.636335]ÂÂ[<ffffffff8777bb48>] sock_sendmsg+0x38/0x50
[ÂÂ884.636354]ÂÂ[<ffffffff8777c41c>] ___sys_sendmsg+0x26c/0x280
[ÂÂ884.636378]ÂÂ[<ffffffff8717b042>] ? ring_buffer_unlock_commit+0x32/0x290
[ÂÂ884.636393]ÂÂ[<ffffffff8718122e>] ? __buffer_unlock_commit+0x1e/0x40
[ÂÂ884.636407]ÂÂ[<ffffffff87181d12>] ? tracing_mark_write+0x162/0x2b0
[ÂÂ884.636423]ÂÂ[<ffffffff870e7419>] ? __lock_is_held+0x49/0x70
[ÂÂ884.636440]ÂÂ[<ffffffff8777d0a5>] __sys_sendmsg+0x45/0x80
[ÂÂ884.636459]ÂÂ[<ffffffff8777d0f2>] SyS_sendmsg+0x12/0x20
[ÂÂ884.636477]ÂÂ[<ffffffff878e1e45>] entry_SYSCALL_64_fastpath+0x23/0xc6


johannes