Re: [PATCH] ptr_ring: add barriers

From: Michael S. Tsirkin
Date: Wed Dec 06 2017 - 07:47:04 EST


On Wed, Dec 06, 2017 at 02:51:41PM +0530, George Cherian wrote:
> Hi Michael,
>
>
> On 12/06/2017 12:59 AM, Michael S. Tsirkin wrote:
> > Users of ptr_ring expect that it's safe to give the
> > data structure a pointer and have it be available
> > to consumers, but that actually requires an smb_wmb
> > or a stronger barrier.
> This is not the exact situation we are seeing.

Could you test the patch pls?

> Let me try to explain the situation
>
> Affected on ARM64 platform.
> 1) tun_net_xmit calls skb_array_produce, which pushes the skb to the
> ptr_ring, this push is protected by a producer_lock.
>
> 2)Prior to this call the tun_net_xmit calls skb_orphan which calls the
> skb->destructor and sets skb->destructor and skb->sk as NULL.
>
> 2.a) These 2 writes are getting reordered
>
> 3) At the same time in the receive side (tun_ring_recv), which gets executed
> in another core calls skb_array_consume which pulls the skb from ptr ring,
> this pull is protected by a consumer lock.
>
> 4) eventually calling the skb->destructor (sock_wfree) with stale values.
>
> Also note that this issue is reproducible in a long run and doesn't happen
> immediately after the launch of multiple VM's (infact the particular test
> cases launches 56 VM's which does iperf back and forth)
>
> >
> > In absence of such barriers and on architectures that reorder writes,
> > consumer might read an un=initialized value from an skb pointer stored
> > in the skb array. This was observed causing crashes.
> >
> > To fix, add memory barriers. The barrier we use is a wmb, the
> > assumption being that producers do not need to read the value so we do
> > not need to order these reads.
> It is not the case that producer is reading the value, but the consumer
> reading stale value. So we need to have a strict rmb in place .
>
> >
> > Reported-by: George Cherian <george.cherian@xxxxxxxxxx>
> > Suggested-by: Jason Wang <jasowang@xxxxxxxxxx>
> > Signed-off-by: Michael S. Tsirkin <mst@xxxxxxxxxx>
> > ---
> >
> > George, could you pls report whether this patch fixes
> > the issue for you?
> >
> > This seems to be needed in stable as well.
> >
> >
> >
> >
> > include/linux/ptr_ring.h | 9 +++++++++
> > 1 file changed, 9 insertions(+)
> >
> > diff --git a/include/linux/ptr_ring.h b/include/linux/ptr_ring.h
> > index 37b4bb2..6866df4 100644
> > --- a/include/linux/ptr_ring.h
> > +++ b/include/linux/ptr_ring.h
> > @@ -101,12 +101,18 @@ static inline bool ptr_ring_full_bh(struct ptr_ring *r)
> > /* Note: callers invoking this in a loop must use a compiler barrier,
> > * for example cpu_relax(). Callers must hold producer_lock.
> > + * Callers are responsible for making sure pointer that is being queued
> > + * points to a valid data.
> > */
> > static inline int __ptr_ring_produce(struct ptr_ring *r, void *ptr)
> > {
> > if (unlikely(!r->size) || r->queue[r->producer])
> > return -ENOSPC;
> > + /* Make sure the pointer we are storing points to a valid data. */
> > + /* Pairs with smp_read_barrier_depends in __ptr_ring_consume. */
> > + smp_wmb();
> > +
> > r->queue[r->producer++] = ptr;
> > if (unlikely(r->producer >= r->size))
> > r->producer = 0;
> > @@ -275,6 +281,9 @@ static inline void *__ptr_ring_consume(struct ptr_ring *r)
> > if (ptr)
> > __ptr_ring_discard_one(r);
> > + /* Make sure anyone accessing data through the pointer is up to date. */
> > + /* Pairs with smp_wmb in __ptr_ring_produce. */
> > + smp_read_barrier_depends();
> > return ptr;
> > }
> >