Re: [RFC][PATCH 22/31] locking,tile: Implement atomic{,64}_fetch_{add,sub,and,or,xor}()

From: Chris Metcalf
Date: Mon Apr 25 2016 - 17:11:34 EST


[Grr, resending as text/plain; I have no idea what inspired Thunderbird
to send this as multipart/mixed with HTML.]

On 4/22/2016 5:04 AM, Peter Zijlstra wrote:
Implement FETCH-OP atomic primitives, these are very similar to the
existing OP-RETURN primitives we already have, except they return the
value of the atomic variable_before_ modification.

This is especially useful for irreversible operations -- such as
bitops (because it becomes impossible to reconstruct the state prior
to modification).

XXX please look at the tilegx (CONFIG_64BIT) atomics, I think we get
the barriers wrong (at the very least they're inconsistent).

Signed-off-by: Peter Zijlstra (Intel)<peterz@xxxxxxxxxxxxx>
---
arch/tile/include/asm/atomic.h | 4 +
arch/tile/include/asm/atomic_32.h | 60 +++++++++++++------
arch/tile/include/asm/atomic_64.h | 117 +++++++++++++++++++++++++-------------
arch/tile/include/asm/bitops_32.h | 18 ++---
arch/tile/lib/atomic_32.c | 42 ++++++-------
arch/tile/lib/atomic_asm_32.S | 14 ++--
6 files changed, 161 insertions(+), 94 deletions(-)

[...]
static inline int atomic_add_return(int i, atomic_t *v)
{
int val;
smp_mb(); /* barrier for proper semantics */
val = __insn_fetchadd4((void *)&v->counter, i) + i;
barrier(); /* the "+ i" above will wait on memory */
+ /* XXX smp_mb() instead, as per cmpxchg() ? */
return val;
}

The existing code is subtle but I'm pretty sure it's not a bug.

The tilegx architecture will take the "+ i" and generate an add instruction.
The compiler barrier will make sure the add instruction happens before
anything else that could touch memory, and the microarchitecture will make
sure that the result of the atomic fetchadd has been returned to the core
before any further instructions are issued. (The memory architecture is
lazy, but when you feed a load through an arithmetic operation, we block
issuing any further instructions until the add's operands are available.)

This would not be an adequate memory barrier in general, since other loads
or stores might still be in flight, even if the "val" operand had made it
from memory to the core at this point. However, we have issued no other
stores or loads since the previous memory barrier, so we know that there
can be no other loads or stores in flight, and thus the compiler barrier
plus arithmetic op is equivalent to a memory barrier here.

In hindsight, perhaps a more substantial comment would have been helpful
here. Unless you see something missing in my analysis, I'll plan to go
ahead and add a suitable comment here :-)

Otherwise, though just based on code inspection so far:

Acked-by: Chris Metcalf<cmetcalf@xxxxxxxxxxxx> [for tile]

--
Chris Metcalf, Mellanox Technologies
http://www.mellanox.com