Re: [PATCH v3 01/13] x86/retpoline: Add initial retpoline support

From: Paul Turner
Date: Fri Jan 05 2018 - 07:20:39 EST


On Fri, Jan 5, 2018 at 3:26 AM, Paolo Bonzini <pbonzini@xxxxxxxxxx> wrote:
> On 05/01/2018 11:28, Paul Turner wrote:
>>
>> The "pause; jmp" sequence proved minutely faster than "lfence;jmp" which is why
>> it was chosen.
>>
>> "pause; jmp" 33.231 cycles/call 9.517 ns/call
>> "lfence; jmp" 33.354 cycles/call 9.552 ns/call
>
> Do you have timings for a non-retpolined indirect branch with the
> predictor suppressed via IBRS=1? So at least we can compute the break
> even point.

The data I collected here previously had the run-time cost as a wash.
On Skylake, an IBRS=1 and a retpolined indirect branch had cost within
a few cycles.

The costs to consider when making a choice here are:

- The transition overheads. This is how frequently will you be
switching in and out of protected code (as IBRS needs to be enabled
and disabled at these boundaries).
- The frequency at which you will be executing protected code on one
sibling, and unprotected code on another (enabling IBRS may affect
sibling execution, depending on SKU)
- The implementation cost (retpoline requires auditing/rebuilding your
target, while IBRS can be used out of the box).


>
> Paolo