Re: Direct rdtsc call side-effect

From: H. Peter Anvin
Date: Mon Jun 05 2023 - 12:33:44 EST


On 6/5/23 08:54, David Laight wrote:
From: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
Sent: 05 June 2023 15:44

On Mon, Jun 05 2023 at 10:27, David Laight wrote:
It has to be said that using it as a time source was fundamentally
a bad idea.

Too bad you weren't around many moons ago and educated us on that. That
would have saved us lots of trouble and work.

Indeed :-)
I do remember thinking the TSC was really a good time source when
I first saw it being done about 30 years ago.


The TSC is certainly not perfect; partly because, ironically enough, it was introduced just *before* out of order and power management entered the x86 world.

It is no secret that it has been slow to catch up. It was easy to put a counter in; it is a *lot* harder to make it work in all the possible scenarios in the power-managed, out-of-order world.

It is one of my personal pet projects in the architecture work to push to get that last distance; we are not yet there.


I'm thinking of benchmarking the IP checksum code where you are
trying to find out how many bytes/clock the loop is doing.
On recent x86-64 the theoretical limit (without fighting AVX) 1s 16
bytes/clock, I've measured 12, 8 is relatively easy.
(The current asm code runs at 4 on older cpu, doesn't get
much above 6 at all.)

What happens is that the cpu frequency speeds up as soon as the
test starts but the TSC frequency stays constants.
So you can only use the TSC to measure time, not execution speed.

Run enough copies of 'while :; do :; done &' to make all but one
cpu busy and the cpus all speed up giving completely different
TSC counts for short loops.


That is the reason for architecturally fixed performance counters.

-hpa