Re: call_function_many: fix list delete vs add race

From: Peter Zijlstra
Date: Mon Jan 31 2011 - 16:17:16 EST


On Fri, 2011-01-28 at 18:20 -0600, Milton Miller wrote:
> @@ -491,15 +491,17 @@ void smp_call_function_many(const struct
> cpumask_clear_cpu(this_cpu, data->cpumask);
>
> /*
> - * To ensure the interrupt handler gets an complete view
> - * we order the cpumask and refs writes and order the read
> - * of them in the interrupt handler. In addition we may
> - * only clear our own cpu bit from the mask.
> + * We reuse the call function data without waiting for any grace
> + * period after some other cpu removes it from the global queue.
> + * This means a cpu might find our data block as it is writen.
> + * The interrupt handler waits until it sees refs filled out
> + * while its cpu mask bit is set; here we may only clear our
> + * own cpu mask bit, and must wait to set refs until we are sure
> + * previous writes are complete and we have obtained the lock to
> + * add the element to the queue. We use the acquire and release
> + * of the lock as a wmb() -- acquire prevents write moving up and
> + * release requires old writes are visible.

That's wrong:

->foo =
LOCK
UNLOCK
->bar =

can be re-ordered as:

LOCK
->bar =
->foo =
UNLOCK

Only a UNLOCK+LOCK sequence can be considered an MB.

However, I think the code is still OK, because list_add_rcu() implies a
wmb(), so in that respect its an improvement since we fix a race and
avoid an extra wmb. But the comment needs an update.

> */
> - smp_wmb();
> -
> - atomic_set(&data->refs, cpumask_weight(data->cpumask));
> -
> raw_spin_lock_irqsave(&call_function.lock, flags);
> /*
> * Place entry at the _HEAD_ of the list, so that any cpu still
> @@ -509,6 +511,8 @@ void smp_call_function_many(const struct
> list_add_rcu(&data->csd.list, &call_function.queue);
> raw_spin_unlock_irqrestore(&call_function.lock, flags);

And this wants to grow a comment that it relies on the wmb implied by
list_add_rcu()

> + atomic_set(&data->refs, cpumask_weight(data->cpumask));
> +
> /*
> * Make the list addition visible before sending the ipi.
> * (IPIs must obey or appear to obey normal Linux cache

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/