Re: [PATCH] rdma/siw: avoid smp_store_mb() on a u64

From: Bernard Metzler
Date: Fri Jul 12 2019 - 07:34:00 EST


-----"Arnd Bergmann" <arnd@xxxxxxxx> wrote: -----

>To: "Bernard Metzler" <bmt@xxxxxxxxxxxxxx>, "Doug Ledford"
><dledford@xxxxxxxxxx>, "Jason Gunthorpe" <jgg@xxxxxxxx>
>From: "Arnd Bergmann" <arnd@xxxxxxxx>
>Date: 07/12/2019 10:52AM
>Cc: "Arnd Bergmann" <arnd@xxxxxxxx>, "Peter Zijlstra"
><peterz@xxxxxxxxxxxxx>, "Jason Gunthorpe" <jgg@xxxxxxxxxxxx>,
>linux-rdma@xxxxxxxxxxxxxxx, linux-kernel@xxxxxxxxxxxxxxx
>Subject: [EXTERNAL] [PATCH] rdma/siw: avoid smp_store_mb() on a u64
>
>The new siw driver fails to build on i386 with
>
>drivers/infiniband/sw/siw/siw_qp.c:1025:3: error: invalid output size
>for constraint '+q'
> smp_store_mb(*cq->notify, SIW_NOTIFY_NOT);
> ^
>include/asm-generic/barrier.h:141:35: note: expanded from macro
>'smp_store_mb'
> #define smp_store_mb(var, value) __smp_store_mb(var, value)
> ^
>arch/x86/include/asm/barrier.h:65:47: note: expanded from macro
>'__smp_store_mb'
> #define __smp_store_mb(var, value) do { (void)xchg(&var, value); }
>while (0)
> ^
>include/asm-generic/atomic-instrumented.h:1648:2: note: expanded from
>macro 'xchg'
> arch_xchg(__ai_ptr, __VA_ARGS__);
> \
> ^
>arch/x86/include/asm/cmpxchg.h:78:27: note: expanded from macro
>'arch_xchg'
> #define arch_xchg(ptr, v) __xchg_op((ptr), (v), xchg, "")
> ^
>arch/x86/include/asm/cmpxchg.h:48:19: note: expanded from macro
>'__xchg_op'
> : "+q" (__ret), "+m" (*(ptr))
> \
> ^
>drivers/infiniband/sw/siw/siw_qp.o: In function `siw_sqe_complete':
>siw_qp.c:(.text+0x1450): undefined reference to `__xchg_wrong_size'
>drivers/infiniband/sw/siw/siw_qp.o: In function `siw_rqe_complete':
>siw_qp.c:(.text+0x15b0): undefined reference to `__xchg_wrong_size'
>drivers/infiniband/sw/siw/siw_verbs.o: In function
>`siw_req_notify_cq':
>siw_verbs.c:(.text+0x18ff): undefined reference to
>`__xchg_wrong_size'
>
>Since smp_store_mb() has to be an atomic store, but the architecture
>can only do this on 32-bit quantities or smaller, but 'cq->notify'
>is a 64-bit word.
>
>Apparently the smp_store_mb() is paired with a READ_ONCE() here,
>which
>seems like an odd choice because there is only a barrier on the
>writer
>side and not the reader, and READ_ONCE() is already not atomic on
>quantities larger than a CPU register.
>
>I suspect it is sufficient to use the (possibly nonatomic)
>WRITE_ONCE()
>and an SMP memory barrier here. If it does need to be atomic as well
>as 64-bit quantities, using an
>atomic64_set_release()/atomic64_read_acquire()
>may be a better choice.
>
>Fixes: 303ae1cdfdf7 ("rdma/siw: application interface")
>Fixes: f29dd55b0236 ("rdma/siw: queue pair methods")
>Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
>Signed-off-by: Arnd Bergmann <arnd@xxxxxxxx>
>---
> drivers/infiniband/sw/siw/siw_qp.c | 4 +++-
> drivers/infiniband/sw/siw/siw_verbs.c | 5 +++--
> 2 files changed, 6 insertions(+), 3 deletions(-)
>
>diff --git a/drivers/infiniband/sw/siw/siw_qp.c
>b/drivers/infiniband/sw/siw/siw_qp.c
>index 11383d9f95ef..a2c08f17f13d 100644
>--- a/drivers/infiniband/sw/siw/siw_qp.c
>+++ b/drivers/infiniband/sw/siw/siw_qp.c
>@@ -1016,13 +1016,15 @@ static bool siw_cq_notify_now(struct siw_cq
>*cq, u32 flags)
> if (!cq->base_cq.comp_handler)
> return false;
>
>+ smp_rmb();
> cq_notify = READ_ONCE(*cq->notify);
>
> if ((cq_notify & SIW_NOTIFY_NEXT_COMPLETION) ||
> ((cq_notify & SIW_NOTIFY_SOLICITED) &&
> (flags & SIW_WQE_SOLICITED))) {
> /* dis-arm CQ */
>- smp_store_mb(*cq->notify, SIW_NOTIFY_NOT);
>+ WRITE_ONCE(*cq->notify, SIW_NOTIFY_NOT);
>+ smp_wmb();
>
> return true;
> }
>diff --git a/drivers/infiniband/sw/siw/siw_verbs.c
>b/drivers/infiniband/sw/siw/siw_verbs.c
>index 32dc79d0e898..41c5ab293fe1 100644
>--- a/drivers/infiniband/sw/siw/siw_verbs.c
>+++ b/drivers/infiniband/sw/siw/siw_verbs.c
>@@ -1142,10 +1142,11 @@ int siw_req_notify_cq(struct ib_cq *base_cq,
>enum ib_cq_notify_flags flags)
>
> if ((flags & IB_CQ_SOLICITED_MASK) == IB_CQ_SOLICITED)
> /* CQ event for next solicited completion */
>- smp_store_mb(*cq->notify, SIW_NOTIFY_SOLICITED);
>+ WRITE_ONCE(*cq->notify, SIW_NOTIFY_SOLICITED);
> else
> /* CQ event for any signalled completion */
>- smp_store_mb(*cq->notify, SIW_NOTIFY_ALL);
>+ WRITE_ONCE(*cq->notify, SIW_NOTIFY_ALL);
>+ smp_wmb();
>
> if (flags & IB_CQ_REPORT_MISSED_EVENTS)
> return cq->cq_put - cq->cq_get;
>--
>2.20.0
>
>


Hi Arnd,
Many thanks for pointing that out! Indeed, this CQ notification
mechanism does not take 32 bit architectures into account.
Since we have only three flags to hold here, it's probably better
to make it a 32bit value. That would remove the issue w/o
introducing extra smp_wmb(). I'd prefer smp_store_mb(),
since on some architectures it shall be more efficient.
That would also make it sufficient to use READ_ONCE.

That would be the proposed fix: