Re: io_ordering.rst vs. memory-barriers.txt

From: Akira Yokosawa
Date: Tue Nov 08 2022 - 19:28:55 EST


Hi Tony,

On Tue, 8 Nov 2022 18:07:37 -0500, Tony Battersby wrote:
> (resending in plaintext; damn Thunderbird upgrades...)
>
> While looking up documentation for PCI write posting I noticed that the
> example in Documentation/driver-api/io_ordering.rst seems to contradict
> Documentation/memory-barriers.txt:
>
> -----------
>
> Documentation/driver-api/io_ordering.rst:
>
> On some platforms, so-called memory-mapped I/O is weakly ordered. On such
> platforms, driver writers are responsible for ensuring that I/O writes to
> memory-mapped addresses on their device arrive in the order intended. This is
> typically done by reading a 'safe' device or bridge register, causing the I/O
> chipset to flush pending writes to the device before any reads are posted. A
> driver would usually use this technique immediately prior to the exit of a
> critical section of code protected by spinlocks. This would ensure that
> subsequent writes to I/O space arrived only after all prior writes (much like a
> memory barrier op, mb(), only with respect to I/O).
>
> A more concrete example from a hypothetical device driver::
>
> ...
> CPU A: spin_lock_irqsave(&dev_lock, flags)
> CPU A: val = readl(my_status);
> CPU A: ...
> CPU A: writel(newval, ring_ptr);
> CPU A: spin_unlock_irqrestore(&dev_lock, flags)
> ...
> CPU B: spin_lock_irqsave(&dev_lock, flags)
> CPU B: val = readl(my_status);
> CPU B: ...
> CPU B: writel(newval2, ring_ptr);
> CPU B: spin_unlock_irqrestore(&dev_lock, flags)
> ...
>
> In the case above, the device may receive newval2 before it receives newval,
> which could cause problems. Fixing it is easy enough though::
>
> ...
> CPU A: spin_lock_irqsave(&dev_lock, flags)
> CPU A: val = readl(my_status);
> CPU A: ...
> CPU A: writel(newval, ring_ptr);
> CPU A: (void)readl(safe_register); /* maybe a config register? */
> CPU A: spin_unlock_irqrestore(&dev_lock, flags)
> ...
> CPU B: spin_lock_irqsave(&dev_lock, flags)
> CPU B: val = readl(my_status);
> CPU B: ...
> CPU B: writel(newval2, ring_ptr);
> CPU B: (void)readl(safe_register); /* maybe a config register? */
> CPU B: spin_unlock_irqrestore(&dev_lock, flags)
>
> Here, the reads from safe_register will cause the I/O chipset to flush any
> pending writes before actually posting the read to the chipset, preventing
> possible data corruption.
>
> -----------
>
> Documentation/memory-barriers.txt:
>
> ==========================
> KERNEL I/O BARRIER EFFECTS
> ==========================
>
> Interfacing with peripherals via I/O accesses is deeply architecture and device
> specific. Therefore, drivers which are inherently non-portable may rely on
> specific behaviours of their target systems in order to achieve synchronization
> in the most lightweight manner possible. For drivers intending to be portable
> between multiple architectures and bus implementations, the kernel offers a
> series of accessor functions that provide various degrees of ordering
> guarantees:
>
> (*) readX(), writeX():
>
> The readX() and writeX() MMIO accessors take a pointer to the
> peripheral being accessed as an __iomem * parameter. For pointers
> mapped with the default I/O attributes (e.g. those returned by
> ioremap()), the ordering guarantees are as follows:
>
> 1. All readX() and writeX() accesses to the same peripheral are ordered
> with respect to each other. This ensures that MMIO register accesses
> by the same CPU thread to a particular device will arrive in program
> order.
>
> 2. A writeX() issued by a CPU thread holding a spinlock is ordered
> before a writeX() to the same peripheral from another CPU thread
> issued after a later acquisition of the same spinlock. This ensures
> that MMIO register writes to a particular device issued while holding
> a spinlock will arrive in an order consistent with acquisitions of
> the lock.
>
> -----------
>
> To summarize:
>
> io_ordering.rst says to use readX() before spin_unlock to order writeX()
> calls (on some platforms).
>
> memory-barriers.txt says writeX() calls are already ordered when holding
> the same spinlock.
>
> So...which one is correct?

>From quick glance of io_ordering.rst's git history, contents of this file
is never updated since the beginning of Git history (v2.6.12.rc2).

Which strongly suggests that you can ignore io_ordering.rst.

Thanks, Akira

PS:
Do we need to keep that outdated document???

I think Documentation/driver-api/device-io.rst is the one properly
maintained.

>
>
> There is another example of flushing posted PCI writes at the bottom of
> Documentation/PCI/pci.rst, but that one is consistent with
> memory-barriers.txt.
>
> Tony Battersby
> Cybernetics
>