Re: deadlock in synchronize_srcu() in debugfs?

From: Nicolai Stange
Date: Thu Mar 23 2017 - 11:36:32 EST


Hi Johannes,

On Thu, Mar 23 2017, Johannes Berg wrote:

> Before I go hunting - has anyone seen a deadlock in synchronize_srcu()
> in debugfs_remove() before?

Not yet. How reproducible is this?


> We're observing that with our (backported, but very recent) driver
> against 4.9 (and 4.10, I think),

Do I understand it correctly that this driver has been backported from
4.11-rcX to 4.9/10 and that there isn't any issue with 4.11-rcX?


> but there are no backports of any debugfs things so the backport
> itself doesn't seem like a likely problem.

Right, there haven't been any SRCU related changes to debugfs after
4.8.


> sysrq-w shows a lot of tasks blocked on various locks (e.g. RTNL), but
> the ultimate problem is the wireless stack getting blocked on
> debugfs_remove_recursive(), in __synchronize_srcu(), in
> wait_for_completion() (while holding lots of locks, hence the other
> tasks getting stuck).

Could you share a complete backtrace? For example, is the
debugfs_remove_recursive() called from any debugfs file's fops and thus,
possibly from within a SRCU read side critical section?


Thanks,

Nicolai