Re: [RFC PATCH 0/3] restartable sequences v2: fast user-space percpu critical sections

From: Andy Lutomirski
Date: Fri Apr 08 2016 - 11:58:52 EST


On Apr 7, 2016 11:41 PM, "Peter Zijlstra" <peterz@xxxxxxxxxxxxx> wrote:
>
> On Thu, Apr 07, 2016 at 09:43:33AM -0700, Andy Lutomirski wrote:
> > enter the critical section:
> > 1:
> > movq %[cpu], %%r12
> > movq {address of counter for our cpu}, %%r13
> > movq {some fresh value}, (%%r13)
> > cmpq %[cpu], %%r12
> > jne 1b
>
> This is inherently racy; your forgot the detail of 'some fresh value',
> but since you want to avoid collisions you really want an increment.
>
> But load-store archs cannot do that. Or rather, they need to do:
>
> load Rn, $event
> add Rn, Rn, 1
> store $event, Rn
>
> But if they're preempted in the middle, two threads will collide and
> generate the _same_ increment. Comparing CPU numbers will not fix that.

Even on x86 this won't work -- we have no actual guarantee we're on
the right CPU, so we'd have to use an atomic.

I was thinking we'd allocate from a per-thread pool (say 24 bits of
thread ID and the rest being a nonce). On load-store architectures
this wouldn't be async-signal-safe, though. Hmm.

--Andy