Re: [PATCH] locking/memory-barriers.txt: Improve documentation for writel() usage

From: Arnd Bergmann
Date: Thu Sep 15 2022 - 14:39:29 EST


On Thu, Sep 15, 2022, at 6:35 PM, Parav Pandit wrote:
>> From: Arnd Bergmann <arnd@xxxxxxxx>
>> Sent: Thursday, September 15, 2022 11:16 AM
>> On Thu, Sep 15, 2022, at 4:18 PM, Parav Pandit wrote:
>> >
>> > So more accurate documentation is to say that 'when using writel() a
>> > prior IO barrier is not needed ...'
>> >
>> > How about that?
>>
>> That's probably fine, not sure if it's worth changing.
>>
> I think it is worth because current documentation, indirectly (or
> incorrectly) indicate that
> "writel() does wmb() internally, so those drivers, who has difficulty
> in using writel() can do, wmb() + raw write".

I don't think it's wrong from a barrier perspective though:
if a driver uses writel_relaxed(), then the only way to guarantee
ordering is to have a full wmb() before it.

> And I sort of see above pattern in two drivers, and it is not good.
> It ends up doing dsb(st) on arm64, while needed barrier is only
> dmb(oshst).
>
> So to fix those two drivers, it is better to first avoid wmb()
> documentation reference when referring to writel().

Yes, this suggestion is correct. On x86 and a few others, I think
it's even worse when wmb() is an expensive barrier, while writel()
is the same as writel_relaxed() and the barrier is implied by the
MMIO access.

It might help to spell this out and say that writel() is always
preferred over wmb()+writel_relaxed().

Site note: there are several other problems with wmb()+__raw_writel(),
which on many architectures does not guarantee any atomicity of
the access (a word store could get split into four byte stores),
breaks endianess assumptions and may still not provide the correct
barrier semantics.

>> I see that there is more going on with that function, at least the loop in
>> post_send_nop() probably just wants to use __iowrite64_copy(), but that
>> also has no barrier in it, while changing mlx5_write64() to use iowrite64be()
>> or similar would of course add excessive barriers inside of the loop.
>
> True. All other conversion seems possible.
> For post_send_nop(), __iowmb() needs to be exposed, which is not
> available today and it is only one-off user,
> I am inclined to keep post_send_nop() as-is, but want to
> improve/correct rest of the callers in these two drivers.

__iowmb() is architecture-specific and does not have a well-defined
behavior. wmb() is probably the best choice for post_send_nop().
Alternatively, one could use __iowrite64_copy() for the first few
fields followed by a single writel64be for the last one.

If you think we need something better than that, maybe having
an iowrite64_copy() (without leading __) that includes a barrier
would work.

Arnd