Re: 2.6.13-rc6-rt6

From: Steven Rostedt
Date: Wed Aug 17 2005 - 12:33:50 EST


On Wed, 2005-08-17 at 08:47 +0200, Ingo Molnar wrote:

> but stop_machine() looks quite preempt-unsafe to begin with. The
> local_irq_disable() would not be needed at all if prior the
> for_each_online_cpu() loop we'd use set_cpus_allowed. The current method
> of achieving 'no preemption' is simply racy even during normal
> CONFIG_PREEMPT.

The code does look flakey, but I think it still works, and it may need
to have a raw_local_irq_disable.

The do_stop is bound to a CPU when it was created, so it doesn't need to
have a set_cpus_allowed.

I guess this is what is happening:

---

You start do_stop in a thread on the cpu that you want to run a function
on. (it's bound to that cpu)

do_stop creates a thread on all the other cpus that will run
stopmachine.

While it creates the threads, the ones that are already created yield in
case a thread wakes up on its cpu, so that that thread can migrate to
its own cpu. (all the threads are at FIFO MAX_RT_PRIO-1, so they should
never be preempted.

After all the threads are created and acknowledge themselves, the
do_stop changes the state to PREPARE.

In stopmachine, when PREPARE is seen, it turns off preemption and stops
yielding.

do_stop then changes the state to DISABLE_IRQ.

In stopmachine, when DISABLE_IRQ is seen, it disables IRQs.

Then the do_stop runs the function that is expected to run in the
STOPPED STATE.

---

Notes:

Each time the state is changed, the ack counter is zeroed and it wont
continue until all the threads acknowledge they hit the expected point.

I'm currious why it needs to go to preempt disable before going to
irqs_disabled? It seems that once it is at the point to go, all
processes should be locked on their CPUS and it would only need to goto
irqs_disable.

I'm also assuming that whatever function is being run expects to have
interrupts off on all CPUs. So a raw_local_irq_disable may be in order.

The comment above the stop_machine (called by do_stop) local_irq_disable
looks incorrect. It says
/* Don't schedule us away at this point, please */ I don't see how it
can be scheduled out if it is running FIFO MAX_RT_PRIO-1. It just
assume that it doesn't want any interrupts to go off.

Do you still see a problem with stop_machine?

-- Steve


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/