Re: [PATCH, RFC, tip/core/rcu] v3 scalable classic RCU implementation
From: Manfred Spraul
Date: Sun Aug 31 2008 - 06:58:29 EST
Paul E. McKenney wrote:
Perhaps it's possible to rely on CPU_DYING, but I haven't figured out yet
how to handle read-side critical sections in CPU_DYING handlers.
Interrupts after CPU_DYING could be handled by rcu_irq_enter(),
rcu_irq_exit() [yes, they exist on x86: the arch code enables the local
interrupts in order to process the currently queued interrupts]
My feeling is that CPU online/offline will be quite rare, so it should
be OK to clean up after the races in force_quiescent_state(), which in
this version is called every three ticks in a given grace period.
If you add failing cpu offline calls, then the problem appears to be
unsolvable:
If I get it right, the offlining process looks like this:
* one cpu in the system makes the CPU_DOWN_PREPARE notifier call. These
calls can sleep (e.g. slab sleeps on semaphores). The cpu that goes
offline is still alive, still doing arbitrary work. cpu_quiet calls on
behalf of the cpu would be wrong.
* stop_machine: all cpus schedule to a special kernel thread [1], only
the dying cpu runs.
* The cpu that goes offline calls the CPU_DYING notifiers.
* __cpu_disable(): The cpu that goes offline check if it's possible to
offline the cpu. At least on i386, this can fail.
On success:
* at least on i386: the cpu that goes offline handles outstanding
interrupts. I'm not sure, perhaps even softirqs are handled.
* the cpus stopps handling interrupts.
* stop machine leaves, the remaining cpus continue their work.
* The CPU_DEAD notifiers are called. They can sleep.
On failure:
* all cpus continue their work. call_rcu, synchronize_rcu(), ...
* some time later: the CPU_DOWN_FAILED callbacks are called.
Is that description correct?
Then:
- treating a cpu as always quiet after the rcu notifer was called with
CPU_OFFLINE_PREPARE is wrong: the target cpu still runs normal code:
user space, kernel space, interrupts, whatever. The target cpu still
accepts interrupst, thus treating it as "normal" should work.
__cpu_disable() success:
- after CPU_DYING, a cpu is either in an interrupt or outside read-side
critical sections. Parallel synchronize_rcu() calls are impossible until
the cpu is dead. call_rcu() is probably possible.
- The CPU_DEAD notifiers are called. a synchronize_rcu() call before the
rcu notifier is called is possible.
__cpu_disable() failure:
- CPU_DYING is called, but the cpu remains fully alive. The system comes
fully alive again.
- some time later, CPU_DEAD is called.
With the current CPU_DYING callback, it's impossible to be both
deadlock-free and race-free with the given conditions. If
__cpu_disable() succeeds, then the cpu must be treated as gone and
always idle. If __cpu_disable() fails, then the cpu must be treated as
fully there. Doing both things at the same time is impossible. Waiting
until CPU_DOWN_FAILED or CPU_DEAD is called is impossible, too: Either
synchronize_rcu() in a CPU_DEAD notifier [called before the rcu
notifier] would deadlock or read-side critical sections on the
not-killed cpu would race.
What about moving the CPU_DYING notifier calls behind the
__cpu_disable() call?
Any other solutions?
Btw, as far as I can see, rcupreempt would deadlock if a CPU_DEAD
notifier uses synchronize_rcu().
Probably noone will ever succeed in triggering the deadlock:
- cpu goes offline.
- the other cpus in the system are restarted.
- one cpu does the CPU_DEAD notifier calls.
- before the rcu notifier is called with CPU_DEAD:
- one CPU_DEAD notifier sleeps.
- while CPU_DEAD is sleeping: on the same cpu: kmem_cache_destroy is
called. get_online_cpus immediately succeeds.
- kmem_cache_destroy acquires the cache_chain_mutex.
- kmem_cache_destroy does synchronize_rcu(), it sleeps.
- CPU_DEAD processing continues, the slab CPU_DEAD tries to acquire the
cache_chain_mutex. it sleeps, too.
--> deadlock, because the already dead cpu will never signal itself as
quiet. Thus synchronize_rcu() will never succeed, thus the slab CPU_DEAD
notifier will never return, thus rcu_offline_cpu() is never called.
--
Manfred
[1] open question: with rcu_preempt, is it possible that these cpus
could be inside read side critical sections?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/