Re: [PATCH] swiotlb: fix the check whether a device has used software IO TLB

From: Petr Tesařík
Date: Fri Sep 22 2023 - 09:31:52 EST


Hi Catalin,

thanks again for your reply. I'm sorry for being slow. This world of
weakly ordered memory models is complex, and I was too distracted most
of this week, but I hope I have finally wrapped my head around it.

On Mon, 18 Sep 2023 16:45:34 +0100
Catalin Marinas <catalin.marinas@xxxxxxx> wrote:

> On Sun, Sep 17, 2023 at 11:47:41AM +0200, Petr Tesařík wrote:
> > On Fri, 15 Sep 2023 18:09:28 +0100
> > Catalin Marinas <catalin.marinas@xxxxxxx> wrote:
> > > On Fri, Sep 15, 2023 at 11:13:43AM +0200, Petr Tesařík wrote:
> > > > On Thu, 14 Sep 2023 19:28:01 +0100
> > > > Catalin Marinas <catalin.marinas@xxxxxxx> wrote:
> > > > > What do the smp_wmb() barriers in swiotlb_find_slots() and
> > > > > swiotlb_dyn_alloc() order? The latter is even more unclear as it's at
> > > > > the end of the function and the "pairing" comment doesn't help.
> > > >
> > > > By the time swiotlb_find_slots() returns a valid slot index, the new
> > > > value of dev->dma_uses_io_tlb must be visible by all CPUs in
> > > > is_swiotlb_buffer(). The index is used to calculate the bounce buffer
> > > > address returned to device drivers. This address may be passed to
> > > > another CPU and used as an argument to is_swiotlb_buffer().
> > >
> > > Ah, I remember now. So the smp_wmb() ensures that dma_uses_io_tlb is
> > > seen by other CPUs before the slot address (presumably passed via other
> > > memory write). It may be worth updating the comment in the code (I'm
> > > sure I'll forget it in a month time). The smp_rmb() before READ_ONCE()
> > > in this patch is also needed for the same reasons (ordering after the
> > > read of the address passed to is_swiotlb_buffer()).
>[...]
> > > BTW, you may want to use WRITE_ONCE() when setting dma_uses_io_tlb (it
> > > also matches the READ_ONCE() in is_swiotlb_buffer()). Or you can use
> > > smp_store_mb() (but check its semantics first).
> >
> > I can use WRITE_ONCE(), although I believe it does not make much
> > difference thanks to the barrier provided by smp_wmb().
>
> WRITE_ONCE() is about atomicity rather than ordering (and avoiding
> compiler optimisations messing things). While I don't see the compiler
> generating multiple accesses for a boolean write, using these accessors
> also helps tools like kcsan.

While I still believe a simple assignment works just fine here, I agree
that WRITE_ONCE() is better. It can prevent potential bugs if someone
ever turns the boolean into something else.

>[...]
> > Ah... You may have a point after all if this sequence of events is
> > possible:
> >
> > - CPU 0 writes new value to mem->pools->next in swiotlb_dyn_alloc().
> >
> > - CPU 1 observes the new value in swiotlb_find_slots(), even though it
> > is not guaranteed by any barrier, allocates a slot and sets the
> > dev->dma_uses_io_tlb flag.
> >
> > - CPU 1 (driver code) writes the returned buffer address into its
> > private struct. This write is ordered after dev->dma_uses_io_tlb
> > thanks to the smp_wmb() in swiotlb_find_slots().
> >
> > - CPU 2 (driver code) reads the buffer address, and DMA core passes it
> > to is_swiotlb_buffer(), which contains smp_rmb().
> >
> > - IIUC CPU 2 is guaranteed to observe the new value of
> > dev->dma_uses_io_tlb, but it may still use the old value of
> > mem->pools->next, because the write on CPU 0 was not ordered
> > against anything. The fact that the new value was observed by CPU 1
> > does not mean that it is also observed by CPU 2.
>
> Yes, that's possible. On CPU 1 there is a control dependency between the
> read of mem->pools->next and the write of dev->dma_uses_io_tlb but I
> don't think this is sufficient to claim multi-copy atomicity (if CPU 1
> sees mem->pools->next write by CPU 0, CPU 2 must see it as well), at
> least not on all architectures supported by Linux. memory-barriers.txt
> says that a full barrier on CPU 1 is needed between the read and write,
> i.e. smp_mb() before WRITE_ONCE(dev->dma_uses_io_tlb). You could add it
> just before "goto found" in swiotlb_find_slots() since it's only needed
> on this path.

Let me check my understanding. This smp_mb() is not needed to make sure
that the write to dev->dma_uses_io_tlb cannot be visible before the
read of mem->pools->next. Since stores are not speculated, that
ordering is provided by the control dependency alone.

But a general barrier ensures that a third CPU will observe the write to
mem->pools->next after the read of mem->pools->next. Makes sense.

I think I can send a v2 of my patch now, with abundant comments on the
memory barriers.

> Another thing I noticed - the write in add_mem_pool() to mem->nslabs is
> not ordered with list_add_rcu(). I assume swiotlb_find_slots() doesn't
> need to access it since it just walks the mem->pools list.

That's correct. Writes to mem->nslabs are known to be racy, but it
doesn't matter. This is explained in commit 1aaa736815eb ("swiotlb:
allocate a new memory pool when existing pools are full"):

- swiotlb_tbl_map_single() and is_swiotlb_active() only check for non-zero
value. This is ensured by the existence of the default memory pool,
allocated at boot.

- The exact value is used only for non-critical purposes (debugfs, kernel
messages).

Petr T