Re: [Lse-tech] Re: [PATCH 0/7]: Fix for unsafe notifier chain

From: Keith Owens
Date: Sun Nov 27 2005 - 20:23:12 EST

Next message: Rob Landley: "Question: madvise(DONT_SYNC)"
Previous message: Robert Hancock: "Re: What's wrong with this really simple function?"
In reply to: Paul E. McKenney: "Re: [Lse-tech] Re: [PATCH 0/7]: Fix for unsafe notifier chain"
Next in thread: Chandra Seetharaman: "Re: [PATCH 0/7]: Fix for unsafe notifier chain"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Sun, 27 Nov 2005 11:56:40 -0800,
Andrew Morton <akpm@xxxxxxxx> wrote:
>We're saying that kernel/sys.c:notifier_lock should be removed and that all
>callers of notifier_chain_register(), notifier_chain_unregister() and
>notifier_call_chain() should be changed to define and use their own lock.

We are arguing past each other's assumptions, so I am laying out my
assumptions in the hope that it will clear the air.

* Traversal of the notifier callback chain can race against
registration and unregistration on the same chain, some form of
protection is required to remove this race. The only reason that it
has not bitten us so far is that registration and unregistration of
notifier callbacks are rare events.

* There are two major classes of notifier callbacks. One class calls
routines that need to sleep, the other class is called in a context
where it cannot sleep. There is no one size that fits all types of
notifier callbacks, nor do the suggested patches claim that one size
fits all.

* If the callbacks can sleep then clearly the lock type for that class
must be a sleeping lock. This class is not a problem.

* The callbacks that are called from non-sleeping contexts vary in
their requirements, but the hardest problem is callbacks from
non-maskable interrupts. By definition, no amount of locking can
protect against NMI, so the list traversal for this class must be
lock free. RCU is only one example of lock free code, any lock free
algorithm would do the job.

* Lock free list traversal has to be different from normal list
traversal. IOW it is not enough to push the responsibility for
locking onto the caller of notifier_chain_{un,}register, we need a
different version of notifier_call_chain() as well, therefore
kernel/sys.c has to know what type of chain it is traversing.

* Once all the work has been done for the hard case of NMI, it makes
sense to reuse that lock free code for all the other chains which are
traversed in non-sleeping contexts. IOW, adding list traversal code
for different variants of spinlock, preempt disable etc. is just
adding more code for no benefit. This applies whether the locking
(if any) is done in the caller or in kernel/sys.c, in either case it
just duplicates code.

* If you accept that the lock free list traversal should not be
duplicated across several callers (and several architectures) then
the same argument applies to the class of callbacks that can sleep.
Why duplicate that code or distribute the locks when a single lock in
one place will do? I have not seen any performance data that says
that the lock for the sleeping notifier chains is any sort of
bottleneck, which would be the only justfication for splitting the
sleeping lock.

* You could argue (and I might even agree) that folding the two classes
of callback into a single set of registration routines is confusing
and there should be separate routines for
notifier_chain_blocking_register, notifier_chain_blocking_unregister,
notifier_call_blocking_chain,
notifier_chain_atomic_register, notifier_chain_atomic_unregister,
notifier_call_atomic_chain.
However that is an implementation detail. Adding the type flag to
the notifier chain head minimizes the changes to the existing code.
I could live with separate routines for blocking and atomic notifier
chains, even though it requires more work to patch it.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Rob Landley: "Question: madvise(DONT_SYNC)"
Previous message: Robert Hancock: "Re: What's wrong with this really simple function?"
In reply to: Paul E. McKenney: "Re: [Lse-tech] Re: [PATCH 0/7]: Fix for unsafe notifier chain"
Next in thread: Chandra Seetharaman: "Re: [PATCH 0/7]: Fix for unsafe notifier chain"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]