Re: [PATCH RFC] v7 expedited "big hammer" RCU grace periods

From: Paul E. McKenney
Date: Wed May 27 2009 - 00:30:26 EST


On Wed, May 27, 2009 at 09:57:19AM +0800, Lai Jiangshan wrote:
> Paul E. McKenney wrote:
> >
> > I am concerned about the following sequence of events:
> >
> > o synchronize_sched_expedited() disables preemption, thus blocking
> > offlining operations.
> >
> > o CPU 1 starts offlining CPU 0. It acquires the CPU-hotplug lock,
> > and proceeds, and is now waiting for preemption to be enabled.
> >
> > o synchronize_sched_expedited() disables preemption, sees
> > that CPU 0 is online, so initializes and queues a request,
> > does a wake-up-process(), and finally does a preempt_enable().
> >
> > o CPU 0 is currently running a high-priority real-time process,
> > so the wakeup does not immediately happen.
> >
> > o The offlining process completes, including the kthread_stop()
> > to the migration task.
> >
> > o The migration task wakes up, sees kthread_should_stop(),
> > and so exits without checking its queue.
> >
> > o synchronize_sched_expedited() waits forever for CPU 0 to respond.
> >
> > I suppose that one way to handle this would be to check for the CPU
> > going offline before doing the wait_for_completion(), but I am concerned
> > about races affecting this check as well.
> >
> > Or is there something in the CPU-offline process that makes the above
> > sequence of events impossible?
> >
> > Thanx, Paul
> >
> >
>
> I realized this, I wrote this:
> >
> > The coupling of synchronize_sched_expedited() and migration_req
> > is largely increased:
> >
> > 1) The offline cpu's per_cpu(rcu_migration_req, cpu) is handled.
> > See migration_call::CPU_DEAD
>
> synchronize_sched_expedited() will not wait for CPU#0, because
> migration_call()::case CPU_DEAD wakes up the requestors.
>
> migration_call()
> {
> ...
> case CPU_DEAD:
> case CPU_DEAD_FROZEN:
> ...
> /*
> * No need to migrate the tasks: it was best-effort if
> * they didn't take sched_hotcpu_mutex. Just wake up
> * the requestors.
> */
> spin_lock_irq(&rq->lock);
> while (!list_empty(&rq->migration_queue)) {
> struct migration_req *req;
>
> req = list_entry(rq->migration_queue.next,
> struct migration_req, list);
> list_del_init(&req->list);
> spin_unlock_irq(&rq->lock);
> complete(&req->done);
> spin_lock_irq(&rq->lock);
> }
> spin_unlock_irq(&rq->lock);
> ...
> ...
> }
>
> My approach depend on the requestors are waked up at any case.
> migration_call() does it for us but the coupling is largely
> increased.

OK, good point! I do need to think about this.

In the meantime, where do you see a need to run
synchronize_sched_expedited() from within a hotplug CPU notifier?

Thanx, Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/