Re: [PATCH] x86/tsc: use real seqcount_latch in cyc2ns_read_begin()

From: Peter Zijlstra
Date: Thu Oct 11 2018 - 03:31:41 EST


On Wed, Oct 10, 2018 at 05:33:36PM -0700, Eric Dumazet wrote:
> While looking at native_sched_clock() disassembly I had
> the surprise to see the compiler (gcc 7.3 here) had
> optimized out the loop, meaning the code is broken.
>
> Using the documented and approved API not only fixes the bug,
> it also makes the code more readable.
>
> Replacing five this_cpu_read() by one this_cpu_ptr() makes
> the generated code smaller.

Does not for me, that is, the resulting asm is actually larger

You're quite right the loop went missing; no idea wth that compiler is
smoking (gcc-8.2 for me). In order to eliminate that loop it needs to
think that two consecutive loads of this_cpu_read(cyc2ns.seq.sequence)
will return the same value. But this_cpu_read() is an asm() statement,
it _should_ not assume such.

We assume that this_cpu_read() implies READ_ONCE() in a number of
locations, this really should not happen.

The reason it was written using this_cpu_read() is so that it can use
%gs: prefixed instructions and avoid ever loading that percpu offset and
doing manual address computation.

Let me prod at this with a sharp stick.