Re: x86 - cpu_relax - why nop vs. pause?

From: Michael Breuer
Date: Sun Feb 07 2010 - 22:50:36 EST


On 2/7/2010 4:15 PM, Michael Breuer wrote:
On 02/07/2010 03:08 PM, Michael Breuer wrote:
On 2/7/2010 1:14 PM, Mike Galbraith wrote:
...
Case1 - asm volatile("pause" ::: "memory");
0000000000400480 <main>:
400480: f3 90 pause
400482: c3 retq
400483: 90 nop

...

Case3 - asm volitile("rep;pause" ::: "memory")
0000000000400480 <main>:
400480: f3 f3 90 pause
400483: c3 retq
400484: 90 nop
_______
Note the difference between opcodes case 1 and case 3, and the mess made by the compiler in case 2.

As to benchmarks - I've checked a few things, no formal or lasting stuff... but striking at first glance:

1) At idle, perf top shows time spent in _raw_spin_lock dropping from ~35% to ~25%.
2) Running a media transcode (single core - handbrakecli): frame rate increased by about 5-10%.
3) During file-intensive operations (#2, above, or copying large files - ext4 on software raid6) - latencytop shows a decerase on writing a page to disc from about 120ms to about 90ms.

Disregard case 2 - was missing -O3. With -O3 or -O2 rep;nop and pause are identical. The interesting case is rep;pause which is different and seems more efficient.
Just to move away from this... totally perplexed, I retested a bit. Seems something else had gone wrong causing me to try 'rep;pause' vs. 'pause'. The resulting opcode is f3 f3 90, as noted above.

I do see what seems to be a small but noticeable performance improvement - no idea if there's a downside, and also no idea what f3 f3 90 does vs. f3 90. Might be something interesting, or maybe not.
Test scenario:

Boot clean to single user mode. perform tiotest -8 five times.
%cpu is %usr + %sys as reported by tiotest.

Results:
Writes
pause: 1.14 sec; 72.01MB/sec; 322.44%cpu
rep;pause: 1.12 sec; 70.4MB/sec; 311.58%cpu
Random Writes
pause: 3.7 sec; 8.51MB/sec; 66.48%cpu
rep;pause 3.46sec; 9.04MB/sec; 72.34%cpu
Reads
pause: 11557.48MB/sec; 6040.74%cpu
rep;pause 11620.15MB/sec; 5974.90%cpu
Random Reads
pause: 11416.9MB/sec; 5330.50%cpu
rep;pause 11786.99MB/sec; 5118.66%cpu


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/