Re: [PATCH v2] tools/memory-model: Add extra ordering for locks and remove it for ordinary release/acquire

From: Michael Ellerman
Date: Mon Jul 16 2018 - 10:40:34 EST


Peter Zijlstra <peterz@xxxxxxxxxxxxx> writes:
> On Fri, Jul 13, 2018 at 11:15:26PM +1000, Michael Ellerman wrote:
...
>>
>>
>> So 18-32% slower, or 23-47 cycles.
>
> Very good info. Note that another option is to put the SYNC in lock() it
> doesn't really matter which of the two primitives gets it. I don't
> suppose it really matters for timing either way around.

If the numbers can be trusted it is actually slower to put the sync in
lock, at least on one of the machines:

Time
lwsync_sync 84,932,987,977
sync_lwsync 93,185,930,333

On the other machine it's slower but only by 0.1%, so that's slightly
weird.

The other advantage of putting the sync in unlock is we could get rid of
our SYNC_IO logic, which conditionally puts a sync in unlock to order
IO accesses vs unlock.


>> Next week I can do some macro benchmarks, to see if it's actually
>> detectable at all.


I guess arguably it's not a very macro benchmark, but we have a
context_switch benchmark in the tree[1] which we often use to tune
things, and it degrades badly. It just spins up two threads and has them
ping-pong using yield.


The numbers are context switch iterations, so more == better.

| Before | After | Change | Change %
+------------+------------+------------+----------
| 35,601,160 | 32,371,164 | -3,229,996 | -9.07%
| 35,762,126 | 32,438,798 | -3,323,328 | -9.29%
| 35,690,870 | 32,353,676 | -3,337,194 | -9.35%
| 35,440,346 | 32,336,750 | -3,103,596 | -8.76%
| 35,614,868 | 32,676,378 | -2,938,490 | -8.25%
| 35,659,690 | 32,462,624 | -3,197,066 | -8.97%
| 35,594,058 | 32,403,922 | -3,190,136 | -8.96%
| 35,682,682 | 32,353,146 | -3,329,536 | -9.33%
| 35,954,454 | 32,306,168 | -3,648,286 | -10.15%
| 35,849,314 | 32,291,094 | -3,558,220 | -9.93%
----------+------------+------------+------------+----------
Average | 35,684,956 | 32,399,372 | -3,285,584 | -9.21%
Std Dev | 143,877 | 111,385 |
Std Dev % | 0.40% | 0.34% |

[1]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/testing/selftests/powerpc/benchmarks/context_switch.c


I'll do some kernbench runs tomorrow and see if it shows up there.


>> My personal preference would be to switch to sync, we don't want to be
>> the only arch finding (or not finding!) exotic ordering bugs.
>>
>> But we'd also rather not make our slow locks any slower than they have
>> to be.
>
> I completely understand, but I'll get you beer (lots) if you do manage
> to make SYNC happen :-) :-)

Just so we're clear Fosters is not beer :)

cheers