Re: [patch 2/3] scheduler: add full memory barriers upon taskswitch at runqueue lock/unlock

From: Steven Rostedt
Date: Mon Feb 01 2010 - 13:00:15 EST


On Mon, 2010-02-01 at 12:45 -0500, Mathieu Desnoyers wrote:
> * Linus Torvalds (torvalds@xxxxxxxxxxxxxxxxxxxx) wrote:

> It's probably worthwhile to summarize here the mb-related discussions we
> had in the previous RFC rounds.

Actually it is probably worthwhile to include this explanation in the
change log. Can't expect everyone to have read a kernel thread that
contains hundreds of replies.

>
> Here is the detailed execution scenario showing the race. The race with
> unlock showing that we need to order user-space loads before mm_cpumask
> store. This is showing an execution sequence involving the userspace RCU
> library and the Linux kernel with sys_membarrier(). As we see below, the
> lack of ordering between "cpumask_clear" and user-space execution
> creates a race with the scheduler where sys_membarrier() incorrectly
> considers CPU 1 as not needing a membarrier_ipi.
>
>
> CPU 0 CPU 1
> -------------- --------------
> <user space> <user space> (already in read-side C.S.)
> obj = rcu_dereference(list->next);
> -> load list->next
> copy = obj->foo;
> rcu_read_unlock();
> -> store to urcu_reader.ctr
> <urcu_reader.ctr store is globally visible>
> list_del(obj);
> <preempt>
> <kernel space>
>
> schedule();
> cpumask_clear(mm_cpumask, cpu);
>
> sys_membarrier();
> <kernel space>
> smp_mb()
> iterates on mm_cpumask
> smp_mb()
> <user space>
> set global g.p. (urcu_gp_ctr) phase to 1
> wait for all urcu_reader.ctr in phase 0
> set global g.p. (urcu_gp_ctr) phase to 0
> wait for all urcu_reader.ctr in phase 1
> sys_membarrier();
> <kernel space>
> smp_mb()
> iterates on mm_cpumask
> smp_mb()
> <user space>
> free(obj);
> <list->next load hits memory>
> <obj->foo load hits memory> <- corruption
>
>
> There is a similar race scenario (symmetric to the one above) between
> cpumask_set and a following user-space rcu_read_lock(), which I could
> provide upon request.

If it would help others to understand, it would be worthwhile in
explaining that too. So, I'm requesting it ;-)

>
> This results in the following comments around smp_mb() in
> sys_membarrier:
>
> First sys_membarrier smp_mb():
>
> /*
> * Memory barrier on the caller thread before sending first
> * IPI. Orders all memory accesses prior to sys_membarrier() before
> * mm_cpumask read and membarrier_ipi executions.
> * Matches memory barriers before and after mm_cpumask updates.
> */

The above comment does not really explain what it is protecting. It just
states that it serializes memory access. Well, duh, that's what smp_mb()
does ;-) It's the equivalent to i++; /* increment i */

Basically, the comment should explain "why" the memory barrier is
needed.

>
> Second sys_membarrier smp_mb():
>
> /*
> * Memory barrier on the caller thread after we finished
> * waiting for the last IPI. Orders mm_cpumask read and membarrier_ipi
> * executions before memory accesses following sys_membarrier()
> * execution.
> * Matches memory barriers before and after mm_cpumask updates.
> */

Ditto.

-- Steve


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/