Re: 3.9.x: Possible race related to stop_machine leads to lockup.

From: Ben Greear
Date: Tue Jun 04 2013 - 18:13:31 EST


On 06/04/2013 02:18 PM, Ben Greear wrote:
I've been trying to figure out why I see the migration/* processes
hang in a busy loop....

While reading the stop_machine.c file, I think I might have an
answer.

The set_state() method sets the thread_ack to the current number
of threads. Each thread's state machine then decrements it down to
zero where it bumps the state to the next level. This lets each
cpu stop in lock-step it seems.

But, from what I can tell, the __stop_machine() method can
(re)set the state to STOPMACHINE_PREPARE while the migration
processes are in their loop. That would explain why they sometimes
loop forever.

Does this make sense?

Err, no..that doesn't make sense. 'smdata' is on the stack.

More printk debugging makes it look like one thread just
never notices that smdata->state has been updated by another
thread.

There is this comment..maybe cpu_relax only does the chill out part
and we need something else to make sure smdata->state is freshly
read from the other CPU's cache?

/* Chill out and ensure we re-read stopmachine_state. */
cpu_relax();
if (smdata->state != curstate) {

Gah..way out of my league :P

Ben


--
Ben Greear <greearb@xxxxxxxxxxxxxxx>
Candela Technologies Inc http://www.candelatech.com

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/