RE: [tip: x86/core] x86/retpoline: Simplify retpolines

From: David Laight
Date: Tue Apr 06 2021 - 04:56:56 EST


From: tip-bot2@xxxxxxxxxxxxx
> Sent: 03 April 2021 12:11
...
> Notice that since the longest alternative sequence is now:
>
> 0: e8 07 00 00 00 callq c <.altinstr_replacement+0xc>
> 5: f3 90 pause
> 7: 0f ae e8 lfence
> a: eb f9 jmp 5 <.altinstr_replacement+0x5>
> c: 48 89 04 24 mov %rax,(%rsp)
> 10: c3 retq
>
> 17 bytes, we have 15 bytes NOP at the end of our 32 byte slot. (IOW, if
> we can shrink the retpoline by 1 byte we can pack it more densely).

Every time I see this I can't help feeling that doing something
(aka anything) to get the 'mov' and 'retq' into the same 16 byte
code fetch/decode block but be advantageous.

Even something like:
call 1f
pause
jmp 2f
1: mov %rax,(%rsp)
retq
2: pause
lfence
jmp 2b
Might meet all the requirements for the retpoline while
allowing the 'mov' and 'retq' be decoded in the same clock.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)