Re: [PATCH v1 1/5] riscv/cmpxchg: Deduplicate xchg() asm functions

From: Leonardo Bras
Date: Thu Jan 04 2024 - 23:46:07 EST


On Thu, Jan 04, 2024 at 01:51:20PM -0800, Boqun Feng wrote:
> On Thu, Jan 04, 2024 at 05:41:26PM -0300, Leonardo Bras wrote:
> > On Thu, Jan 04, 2024 at 11:53:45AM -0800, Boqun Feng wrote:
> > > On Wed, Jan 03, 2024 at 01:31:59PM -0300, Leonardo Bras wrote:
> > > > In this header every xchg define (_relaxed, _acquire, _release, vanilla)
> > > > contain it's own asm file, both for 4-byte variables an 8-byte variables,
> > > > on a total of 8 versions of mostly the same asm.
> > > >
> > > > This is usually bad, as it means any change may be done in up to 8
> > > > different places.
> > > >
> > > > Unify those versions by creating a new define with enough parameters to
> > > > generate any version of the previous 8.
> > > >
> > > > Then unify the result under a more general define, and simplify
> > > > arch_xchg* generation.
> > > >
> > > > (This did not cause any change in generated asm)
> > > >
> > > > Signed-off-by: Leonardo Bras <leobras@xxxxxxxxxx>
> > > > Reviewed-by: Guo Ren <guoren@xxxxxxxxxx>
> > > > Reviewed-by: Andrea Parri <parri.andrea@xxxxxxxxx>
> > > > Tested-by: Guo Ren <guoren@xxxxxxxxxx>
> > > > ---
> > > > arch/riscv/include/asm/cmpxchg.h | 138 ++++++-------------------------
> > > > 1 file changed, 23 insertions(+), 115 deletions(-)
> > > >
> > > > diff --git a/arch/riscv/include/asm/cmpxchg.h b/arch/riscv/include/asm/cmpxchg.h
> > > > index 2f4726d3cfcc2..48478a8eecee7 100644
> > > > --- a/arch/riscv/include/asm/cmpxchg.h
> > > > +++ b/arch/riscv/include/asm/cmpxchg.h
> > > > @@ -11,140 +11,48 @@
> > > > #include <asm/barrier.h>
> > > > #include <asm/fence.h>
> > > >
> > > > -#define __xchg_relaxed(ptr, new, size) \
> > > > +#define __arch_xchg(sfx, prepend, append, r, p, n) \
> > > > ({ \
> > > > - __typeof__(ptr) __ptr = (ptr); \
> > > > - __typeof__(new) __new = (new); \
> > > > - __typeof__(*(ptr)) __ret; \
> > > > - switch (size) { \
> > > > - case 4: \
> > > > - __asm__ __volatile__ ( \
> > > > - " amoswap.w %0, %2, %1\n" \
> > > > - : "=r" (__ret), "+A" (*__ptr) \
> > > > - : "r" (__new) \
> > > > - : "memory"); \
> >
> > Hello Boqun, thanks for reviewing!
> >
> > >
> > > Hmm... actually xchg_relaxed() doesn't need to be a barrier(), so the
> > > "memory" clobber here is not needed here. Of course, it's out of the
> > > scope of this series, but I'm curious to see what would happen if we
> > > remove the "memory" clobber _relaxed() ;-)
> >
> > Nice question :)
> > I am happy my patch can help bring up those ideas :)
> >
> >
> > According to gcc.gnu.org:
> >
> > ---
> > "memory" [clobber]:
> >
> > The "memory" clobber tells the compiler that the assembly code
> > performs memory reads or writes to items other than those listed in
> > the input and output operands (for example, accessing the memory
> > pointed to by one of the input parameters). To ensure memory contains
>
> Note here it says "other than those listed in the input and output
> operands", and in the above asm block, the memory pointed by "__ptr" is
> already marked as read-and-write by the asm block via "+A" (*__ptr), so
> the compiler knows the asm block may modify the memory pointed by
> "__ptr", therefore in _relaxed() case, "memory" clobber can be avoided.

Thanks for pointing that out!
That helped me improve my understanding on constraints for asm operands :)
(I ended up getting even more info from the gcc manual)

So "+A" constraints means the operand will get read/write and it's an
address stored into a register.

>
> Here is an example showing the difference, considering the follow case:
>
> this_val = *this;
> that_val = *that;
> xchg_relaxed(this, 1);
> reread_this = *this;
>
> by the semantics of _relaxed, compilers can optimize the above into
>
> this_val = *this;
> xchg_relaxed(this, 1);
> that_val = *that;
> reread_this = *this;
>

Seems correct, since there is no barrier().

> but the "memory" clobber in the xchg_relexed() will provide this.

By 'this' here you mean the barrier? I mean, IIUC "memory" clobber will
avoid the above optimization, right?

> Needless to say the '"+A" (*__ptr)' prevents compiler from the following
> optimization:
>
> this_val = *this;
> that_val = *that;
> xchg_relaxed(this, 1);
> reread_this = this_val;
>
> since the compiler knows the asm block will read and write *this.

Right, the compiler knows that address will be wrote by the asm block, and
so it reloads the value instead of re-using the old one.


A question, though:
Do we need the "memory" clobber in any other xchg / cmpxchg asm?
I mean, usually the only write to memory will happen in the *__ptr, which
should be safe by "+A".

I understand that since the others are not "relaxed" they will need to
have a barrier, but is not the compiler supposed to understand the barrier
instruction and avoid compiler reordering / optimizations across given
instruction ?


Thanks!
Leo

> Regards,
> Boqun
>
> > correct values, GCC may need to flush specific register values to
> > memory before executing the asm. Further, the compiler does not assume
> > that any values read from memory before an asm remain unchanged after
> > that asm ; it reloads them as needed. Using the "memory" clobber
> > effectively forms a read/write memory barrier for the compiler.
> >
> > Note that this clobber does not prevent the processor from doing
> > speculative reads past the asm statement. To prevent that, you need
> > processor-specific fence instructions.
> > ---
> >
> > IIUC above text says that having memory accesses to *__ptr would require
> > above asm to have the "memory" clobber, so memory accesses don't get
> > reordered by the compiler.
> >
> > By above affirmation, all asm in this file should have the "memory"
> > clobber, since all atomic operations will change memory pointed by an input
> > ptr. Is that correct?
> >
> > Thanks!
> > Leo
> >
> >
> > >
> > > Regards,
> > > Boqun
> > >
> > > > - break; \
> > > > - case 8: \
> > > > - __asm__ __volatile__ ( \
> > > > - " amoswap.d %0, %2, %1\n" \
> > > > - : "=r" (__ret), "+A" (*__ptr) \
> > > > - : "r" (__new) \
> > > > - : "memory"); \
> > > > - break; \
> > > > - default: \
> > > > - BUILD_BUG(); \
> > > > - } \
> > > > - __ret; \
> > > > -})
> > > > -
> > > > -#define arch_xchg_relaxed(ptr, x) \
> > > > -({ \
> > > > - __typeof__(*(ptr)) _x_ = (x); \
> > > > - (__typeof__(*(ptr))) __xchg_relaxed((ptr), \
> > > > - _x_, sizeof(*(ptr))); \
> > > > + __asm__ __volatile__ ( \
> > > > + prepend \
> > > > + " amoswap" sfx " %0, %2, %1\n" \
> > > > + append \
> > > > + : "=r" (r), "+A" (*(p)) \
> > > > + : "r" (n) \
> > > > + : "memory"); \
> > > > })
> > > >
> > > > -#define __xchg_acquire(ptr, new, size) \
> > > > +#define _arch_xchg(ptr, new, sfx, prepend, append) \
> > > > ({ \
> > > > __typeof__(ptr) __ptr = (ptr); \
> > > > - __typeof__(new) __new = (new); \
> > > > - __typeof__(*(ptr)) __ret; \
> > > > - switch (size) { \
> > > > + __typeof__(*(__ptr)) __new = (new); \
> > > > + __typeof__(*(__ptr)) __ret; \
> > > > + switch (sizeof(*__ptr)) { \
> > > > case 4: \
> > > > - __asm__ __volatile__ ( \
> > > > - " amoswap.w %0, %2, %1\n" \
> > > > - RISCV_ACQUIRE_BARRIER \
> > > > - : "=r" (__ret), "+A" (*__ptr) \
> > > > - : "r" (__new) \
> > > > - : "memory"); \
> > > > + __arch_xchg(".w" sfx, prepend, append, \
> > > > + __ret, __ptr, __new); \
> > > > break; \
> > > > case 8: \
> > > > - __asm__ __volatile__ ( \
> > > > - " amoswap.d %0, %2, %1\n" \
> > > > - RISCV_ACQUIRE_BARRIER \
> > > > - : "=r" (__ret), "+A" (*__ptr) \
> > > > - : "r" (__new) \
> > > > - : "memory"); \
> > > > + __arch_xchg(".d" sfx, prepend, append, \
> > > > + __ret, __ptr, __new); \
> > > > break; \
> > > > default: \
> > > > BUILD_BUG(); \
> > > > } \
> > > > - __ret; \
> > > > + (__typeof__(*(__ptr)))__ret; \
> > > > })
> > > >
> > > > -#define arch_xchg_acquire(ptr, x) \
> > > > -({ \
> > > > - __typeof__(*(ptr)) _x_ = (x); \
> > > > - (__typeof__(*(ptr))) __xchg_acquire((ptr), \
> > > > - _x_, sizeof(*(ptr))); \
> > > > -})
> > > > +#define arch_xchg_relaxed(ptr, x) \
> > > > + _arch_xchg(ptr, x, "", "", "")
> > > >
> > > > -#define __xchg_release(ptr, new, size) \
> > > > -({ \
> > > > - __typeof__(ptr) __ptr = (ptr); \
> > > > - __typeof__(new) __new = (new); \
> > > > - __typeof__(*(ptr)) __ret; \
> > > > - switch (size) { \
> > > > - case 4: \
> > > > - __asm__ __volatile__ ( \
> > > > - RISCV_RELEASE_BARRIER \
> > > > - " amoswap.w %0, %2, %1\n" \
> > > > - : "=r" (__ret), "+A" (*__ptr) \
> > > > - : "r" (__new) \
> > > > - : "memory"); \
> > > > - break; \
> > > > - case 8: \
> > > > - __asm__ __volatile__ ( \
> > > > - RISCV_RELEASE_BARRIER \
> > > > - " amoswap.d %0, %2, %1\n" \
> > > > - : "=r" (__ret), "+A" (*__ptr) \
> > > > - : "r" (__new) \
> > > > - : "memory"); \
> > > > - break; \
> > > > - default: \
> > > > - BUILD_BUG(); \
> > > > - } \
> > > > - __ret; \
> > > > -})
> > > > +#define arch_xchg_acquire(ptr, x) \
> > > > + _arch_xchg(ptr, x, "", "", RISCV_ACQUIRE_BARRIER)
> > > >
> > > > #define arch_xchg_release(ptr, x) \
> > > > -({ \
> > > > - __typeof__(*(ptr)) _x_ = (x); \
> > > > - (__typeof__(*(ptr))) __xchg_release((ptr), \
> > > > - _x_, sizeof(*(ptr))); \
> > > > -})
> > > > -
> > > > -#define __arch_xchg(ptr, new, size) \
> > > > -({ \
> > > > - __typeof__(ptr) __ptr = (ptr); \
> > > > - __typeof__(new) __new = (new); \
> > > > - __typeof__(*(ptr)) __ret; \
> > > > - switch (size) { \
> > > > - case 4: \
> > > > - __asm__ __volatile__ ( \
> > > > - " amoswap.w.aqrl %0, %2, %1\n" \
> > > > - : "=r" (__ret), "+A" (*__ptr) \
> > > > - : "r" (__new) \
> > > > - : "memory"); \
> > > > - break; \
> > > > - case 8: \
> > > > - __asm__ __volatile__ ( \
> > > > - " amoswap.d.aqrl %0, %2, %1\n" \
> > > > - : "=r" (__ret), "+A" (*__ptr) \
> > > > - : "r" (__new) \
> > > > - : "memory"); \
> > > > - break; \
> > > > - default: \
> > > > - BUILD_BUG(); \
> > > > - } \
> > > > - __ret; \
> > > > -})
> > > > + _arch_xchg(ptr, x, "", RISCV_RELEASE_BARRIER, "")
> > > >
> > > > #define arch_xchg(ptr, x) \
> > > > -({ \
> > > > - __typeof__(*(ptr)) _x_ = (x); \
> > > > - (__typeof__(*(ptr))) __arch_xchg((ptr), _x_, sizeof(*(ptr))); \
> > > > -})
> > > > + _arch_xchg(ptr, x, ".aqrl", "", "")
> > > >
> > > > #define xchg32(ptr, x) \
> > > > ({ \
> > > > --
> > > > 2.43.0
> > > >
> > >
> >
>