Re: [RFC PATCH 9/9] debugfs: free debugfs_fsdata instances

From: Johannes Berg
Date: Tue Apr 18 2017 - 09:40:41 EST


On Tue, 2017-04-18 at 06:31 -0700, Paul E. McKenney wrote:
> On Tue, Apr 18, 2017 at 11:39:27AM +0200, Johannes Berg wrote:
> > On Mon, 2017-04-17 at 09:01 -0700, Paul E. McKenney wrote:
> >
> > > If you have not already done so, please run this with debug
> > > enabled,
> > > especially CONFIG_PROVE_LOCKING=y (which implies
> > > CONFIG_PROVE_RCU=y).
> > > This is important because there are configurations for which the
> > > deadlocks you saw with SRCU turn into silent failure, including
> > > memory corruption.
> > > CONFIG_PROVE_RCU=y will catch many of those situations.
> >
> > Can you elaborate on that? I think we may have had CONFIG_PROVE_RCU
> > enabled in the builds where we saw the problem, but I'm not sure.
>
> CONFIG_PROVE_RCU=y will reliably catch things like this:
>
> 1. rcu_read_lock();
> synchronize_rcu();
> rcu_read_unlock();

Ok, that's not something that happens here either.

> 2. rcu_read_lock();
> schedule_timeout_interruptible(HZ);
> rcu_read_unlock();

Neither is this happening.

> There are more, but this should get you the flavor of the types
> of bugs CONFIG_PROVE_RCU=y can locate for you.

Makes sense. However, the issue at hand is what we (you and I)
discussed earlier wrt. lockdep -- from SRCU's point of view everything
is actually OK, except that the one thread is waiting for something and
we can never finish the grace period, and thus synchronize_srcu() will
never return. But there's no real SRCU bug here.

> > Nicolai probably never even ran into this problem, though it should
> > be easy to reproduce.
>
> I am just worried that the situation resulting in the earlier SRCU
> deadlocks might be hiding behind CONFIG_PROVE_RCU=n,
> CONFIG_PREEMPT=n, and CONFIG_PREEMPT_COUNT=n.ÂÂOr some other bug
> hiding behind some other set of Kconfig options.

There's no SRCU deadlock though. I know exactly why it happens, in my
case, which is the following:

Thread 1
userspace: read(debugfs_file_1)
srcu_read_lock(&debugfs_srcu); // in debugfs bowels
wait_event_interruptible(...); // in my driver's debugfs read method

Thread 2:
debugfs_remove(debugfs_file_2);
srcu_synchronize(&debugfs_srcu); // in debugfs bowels


This is the live-lock. The deadlock is something I posited but never
ran into:

CPU 1 CPU 2
srcu_read_lock(&debugfs_srcu);
rtnl_lock();
rtnl_lock();
srcu_synchronize(&debugfs_srcu);

Again, no (S)RCU abuse here, just an ABBA deadlock.

johannes