Re: [RFC][PATCH 5/7] x86: convert arch_futex_atomic_op_inuser() to user_access_begin/user_access_end()

From: Linus Torvalds
Date: Tue Mar 24 2020 - 16:57:40 EST


On Tue, Mar 24, 2020 at 1:45 PM Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote:
>
> OK... BTW, I'd been trying to recall the reasons for the way
> __futex_atomic_op2() loop is done; ISTR some discussion along
> the lines of cacheline ping-pong prevention, but I'd been unable
> to reconstruct enough details to find it and I'm not sure it
> hadn't been about some other code ;-/

No, that doesn't look like any cacheline advantage I can think of -
quite the reverse.

I suspect it's just lazy code, with the reload being unnecessary. Or
it might be written that way because you avoid an extra variable.

In fact, from a cacheline optimization standpoint, there are
advantages to not doing the load even on the initial run: if you know
a certain value is particularly likely, there are advantages to just
_assuming_ that value, rather than loading it. So no initial load at
all, and just initialize the first value to the likely case.

That can avoid an unnecessary "load for shared ownership" cacheline
state transition (since the cmpxchg will want to turn it into an
exclusive modified cacheline anyway).

But I don't think that optimization is likely the case here, and
you're right, the loop would be better written with the initial load
outside the loop.

Linus