Re: [patch 5/6] [RFD] timekeeping: Provide optional 128bit math

From: Chris Metcalf
Date: Fri Dec 09 2016 - 12:32:29 EST


On 12/9/2016 3:30 AM, Peter Zijlstra wrote:
On Fri, Dec 09, 2016 at 07:38:47AM +0100, Peter Zijlstra wrote:
On Fri, Dec 09, 2016 at 06:26:38AM +0100, Peter Zijlstra wrote:
Just for giggles, on tilegx the branch is actually slower than doing the
mult unconditionally.

The problem is that the two multiplies would otherwise completely
pipeline, whereas with the conditional you serialize them.
On my Haswell laptop the unconditional version is faster too.
Only when using x86_64 instructions, once I fixed the i386 variant it
was slower, probably due to register pressure and the like.

(came to light while talking about why the mul_u64_u32_shr() fallback
didn't work right for them, which was a combination of the above issue
and the fact that their compiler 'lost' the fact that these are
32x32->64 mults and did 64x64 ones instead).
Turns out using GCC-6.2.1 we have the same problem on i386, GCC doesn't
recognise the 32x32 mults and generates crap.

This used to work :/
Do we want something like so?

---
arch/tile/include/asm/Kbuild | 1 -
arch/tile/include/asm/div64.h | 14 ++++++++++++++
arch/x86/include/asm/div64.h | 10 ++++++++++
include/linux/math64.h | 26 ++++++++++++++++++--------
4 files changed, 42 insertions(+), 9 deletions(-)

Untested, but I looked at it closely, and it seems like a decent idea.

Acked-by: Chris Metcalf <cmetcalf@xxxxxxxxxxxx> [for tile]

Of course if this is pushed up, it will then probably be too tempting for me not
to add the tilegx-specific mul_u64_u32_shr() to take advantage of pipelining
the two 32x32->64 multiplies :-)

--
Chris Metcalf, Mellanox Technologies
http://www.mellanox.com