Re: [PATCH RFC] percpu: add data dependency barrier in percpu accessors and operations

From: Paul E. McKenney
Date: Tue Jul 15 2014 - 13:41:48 EST


On Tue, Jul 15, 2014 at 10:06:01AM -0500, Christoph Lameter wrote:
> On Tue, 15 Jul 2014, Paul E. McKenney wrote:
>
> > On Tue, Jul 15, 2014 at 09:06:00AM -0500, Christoph Lameter wrote:
> > > On Tue, 15 Jul 2014, Paul E. McKenney wrote:
> > >
> > > > If I understand your initialization procedure correctly, you need at least
> > > > an smp_wmb() on the update side and at least an smp_read_barrier_depends()
> > > > on the read side.
> > >
> > > A barrier for data that is not in the cache of the read side? That has
> > > not been accessed yet (well there could have been a free_percpu before but
> > > if so then the cache line was evicted by the initialization code).
> >
> > http://www.openvms.compaq.com/wizard/wiz_2637.html
>
> Not sure what the intend of this link is?

To demonstrate that at least one (mostly historical but nevertheless
very real) architecture can do this:

p = ACCESS_ONCE(gp);
r1 = p->a;

and see pre-initialized data in r1 -even- -if- the initialization made
full and careful use of memory barriers. Aggressive (and mostly not
yet real-world) compiler optimizations can have the same effect.

> > Besides which, if you don't have barriers on the initialization side,
> > then both the CPU and the compiler are free to update the pointer before
> > completing the initialization, which can leave old stuff still in other
> > CPUs' caches for long enough to break you.
>
> The cachelines will be evicted from the other processors at
> initialization. alloc_percpu *itself* zeroes all data on each percpu areas
> before returning the offset to the percpu data structure. See
> pcpu_populate_chunk(). At that point *all* other processors have those
> cachelines no longer in their caches. The initialization done with values
> specific to the subsystem is not that important.
>
> The return value of the function is only available after
> pcpu_populate_chunk() returns.
>
> Access to those cachelines is possible only after the other processors
> have obtained the offset that was stored in some data struture. That
> usually involves additional synchronization which implies barriers
> anyways.
>
> I do not think there is anything here.

Sorry, but whether you see it or not, there is a very real need for at
least an smp_wmb() from the initializing code and at least an
smp_read_barrier_depends() from the reading code.

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/