Re: [Lse-tech] Re: [PATCH 0/7]: Fix for unsafe notifier chain

From: Paul E. McKenney
Date: Sun Nov 27 2005 - 21:48:02 EST


On Sun, Nov 27, 2005 at 02:03:29PM -0800, Greg KH wrote:
> On Sun, Nov 27, 2005 at 11:56:40AM -0800, Andrew Morton wrote:
> > We're saying that kernel/sys.c:notifier_lock should be removed and that all
> > callers of notifier_chain_register(), notifier_chain_unregister() and
> > notifier_call_chain() should be changed to define and use their own lock.
> >
> > So the _callers_ get to decide whether they're going to use down(),
> > spin_lock(), down_read(), read_lock(), preempt_disable(), local_irq_disable()
> > or whatever.
>
> I completely agree. I just watched in mute horror as Chandra and Alan
> spun off into the rcu blackhole trying to create one-size-fits-all
> notifiers.

RCU as blackhole? I am impressed -- even -I- don't find RCU to have the
same limiting-case attraction that a black hole does. ;-)

(My apologies, Greg, but you did leave yourself open for this!)

> Making the user do the locking it needs to do is simple and sane.
>
> And the reason USB's notifiers are implemented correctly, is they don't
> use the notifier core, but rather, reimplemented their own, due to the
> locking mess.

The locking mess is due to the fact that the notifier chain itself
must be protected by something or another, right? Here are the options
I have come across:

1. The notifier chain is guarded by a separate notifier-chain lock.
We then have the following deadlock situation:

o The subsystem requesting the notifier likely has
to hold one of its own locks when registering and
unregistering, which means that the notifier lock
must be subordinate to the subsystem lock.

o But when invoking the notifiers, the notifier lock
must be held while walking the chain. Since the
subsystem being notified likely has to acquire one
of its own locks, the subsystem lock must be subordinate
to the notifier lock.

There are a number of ways of breaking this deadlock, some of
which are listed below. Note that this deadlock situation can
arise both for spinlocks and for sleeplocks.

I believe that this deadlock situation is the core reason why
notifier locking is so difficult to get right.

Even ignoring the deadlock, this does not work for NMIs.

2. Each element of the notifier chain is guarded by reference
counts, and the chain itself is guarded by a lock. Each
element of the chain holds a reference to the next element
in the chain, and new elements are always added to the end
of the chain. The traversal code looks something like the
following:

spin_lock(&chain_lock);
p = head;
atomic_inc(&p->refcnt);
while (p != NULL) {
spin_unlock(&chain_lock);
p->func(p->arg);
spin_lock(&chain_lock);
q = p->next;
atomic_inc(&q->refcnt);
if (atomic_dec_and_test(&p->refcnt)) {
kfree(p);
}
p = q;
}
spin_unlock(&chain_lock);

Note that all actual deletion is done under chain_lock.

This works (I think...), and the subsystem code can use a
single lock that the notifier lock (chain_lock in this case)
is subordinate to. But the notifier traversal loop is quite
heavyweight. And it cannot be invoked from NMI handlers without
risk of self-deadlock.

3. Like #1, but require that the subsystem have at least two locks,
one of which is subordinate to the notifier lock (and is acquired
from the notifier callback function), and the other of which is
superior to the notifier lock, and is held when registering and
deregistering callbacks. This can work, but could significantly
complicate the subsystem locking, and also will require that some
operations acquire two locks instead of just one, since one must
hold both to exclude both notifications and register/unregister
operations.

This approach also fails to handle notifications from NMIs.

4. "Just say no" to a separate notifier-chain lock, and guard
the hole thing with whatever mechanism the subsystem
deems appropriate. This can be made to work with NMI-based
notification, since such subsystems can use RCU or whatever
other lock-free mechanism they desire.

I believe that Greg took this "just say no" approach in USB,
but could easily be missing something.

This works with minimal added complexity to the subsystem,
but requires that each subsystem have its own notifier
chain, since it does not make sense to have a single chain
guarded by multiple locks. But it means that notifier
code must be replicated in a number of subsystems, which
seems sub-optimal.

This might be the best we can do, but in the interest of
completeness...

5. Use RCU to traverse the notifier chain and use simple locking
to guard updates, similar to Alan's and Chandra's patch.
This avoids the deadlock, and handles NMIs nicely, but does
not tolerate synchronous notification callbacks that sleep.
(Cases that can tolerate asynchronous notification can make use
of workqueues or similar mechanisms to defer the sleeping code
so that it is not executed in the scope of the rcu_read_lock()
protecting the notifier-chain traversal.)

6. Have separate mechanisms for the non-sleeping and the synchronous
sleeping cases. For example, #5 for non-sleeping and either #2,
#3, or #4 for the synchronous sleeping case.

This works in all cases, and achieves a high degree of code
commonality, but does require two separate APIs.

7. Use wait-free synchronization everywhere. This has some issues
with complexity, last I checked.

8. Come up with a safe and sane RCU implementation that tolerates
general blocking in read-side critical sections. Note that
although some realtime implementations of RCU do tolerate
blocking, they do so only in the special case that priority
inheritance applies to. Might happen,
but I would not recommend holding your breath.

Any options I missed?

Thanx, Paul
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/