Re: io_ordering.rst vs. memory-barriers.txt

From: Tony Battersby
Date: Tue Nov 08 2022 - 18:07:48 EST


(resending in plaintext; damn Thunderbird upgrades...)

While looking up documentation for PCI write posting I noticed that the
example in Documentation/driver-api/io_ordering.rst seems to contradict
Documentation/memory-barriers.txt:

-----------

Documentation/driver-api/io_ordering.rst:

On some platforms, so-called memory-mapped I/O is weakly ordered. On such
platforms, driver writers are responsible for ensuring that I/O writes to
memory-mapped addresses on their device arrive in the order intended. This is
typically done by reading a 'safe' device or bridge register, causing the I/O
chipset to flush pending writes to the device before any reads are posted. A
driver would usually use this technique immediately prior to the exit of a
critical section of code protected by spinlocks. This would ensure that
subsequent writes to I/O space arrived only after all prior writes (much like a
memory barrier op, mb(), only with respect to I/O).

A more concrete example from a hypothetical device driver::

...
CPU A: spin_lock_irqsave(&dev_lock, flags)
CPU A: val = readl(my_status);
CPU A: ...
CPU A: writel(newval, ring_ptr);
CPU A: spin_unlock_irqrestore(&dev_lock, flags)
...
CPU B: spin_lock_irqsave(&dev_lock, flags)
CPU B: val = readl(my_status);
CPU B: ...
CPU B: writel(newval2, ring_ptr);
CPU B: spin_unlock_irqrestore(&dev_lock, flags)
...

In the case above, the device may receive newval2 before it receives newval,
which could cause problems. Fixing it is easy enough though::

...
CPU A: spin_lock_irqsave(&dev_lock, flags)
CPU A: val = readl(my_status);
CPU A: ...
CPU A: writel(newval, ring_ptr);
CPU A: (void)readl(safe_register); /* maybe a config register? */
CPU A: spin_unlock_irqrestore(&dev_lock, flags)
...
CPU B: spin_lock_irqsave(&dev_lock, flags)
CPU B: val = readl(my_status);
CPU B: ...
CPU B: writel(newval2, ring_ptr);
CPU B: (void)readl(safe_register); /* maybe a config register? */
CPU B: spin_unlock_irqrestore(&dev_lock, flags)

Here, the reads from safe_register will cause the I/O chipset to flush any
pending writes before actually posting the read to the chipset, preventing
possible data corruption.

-----------

Documentation/memory-barriers.txt:

==========================
KERNEL I/O BARRIER EFFECTS
==========================

Interfacing with peripherals via I/O accesses is deeply architecture and device
specific. Therefore, drivers which are inherently non-portable may rely on
specific behaviours of their target systems in order to achieve synchronization
in the most lightweight manner possible. For drivers intending to be portable
between multiple architectures and bus implementations, the kernel offers a
series of accessor functions that provide various degrees of ordering
guarantees:

(*) readX(), writeX():

The readX() and writeX() MMIO accessors take a pointer to the
peripheral being accessed as an __iomem * parameter. For pointers
mapped with the default I/O attributes (e.g. those returned by
ioremap()), the ordering guarantees are as follows:

1. All readX() and writeX() accesses to the same peripheral are ordered
with respect to each other. This ensures that MMIO register accesses
by the same CPU thread to a particular device will arrive in program
order.

2. A writeX() issued by a CPU thread holding a spinlock is ordered
before a writeX() to the same peripheral from another CPU thread
issued after a later acquisition of the same spinlock. This ensures
that MMIO register writes to a particular device issued while holding
a spinlock will arrive in an order consistent with acquisitions of
the lock.

-----------

To summarize:

io_ordering.rst says to use readX() before spin_unlock to order writeX()
calls (on some platforms).

memory-barriers.txt says writeX() calls are already ordered when holding
the same spinlock.

So...which one is correct?


There is another example of flushing posted PCI writes at the bottom of
Documentation/PCI/pci.rst, but that one is consistent with
memory-barriers.txt.

Tony Battersby
Cybernetics