Re: [PATCH] x86/retpoline: Avoid return buffer underflows on context switch

From: Paul Turner
Date: Mon Jan 08 2018 - 21:49:26 EST


On Mon, Jan 8, 2018 at 4:48 PM, David Woodhouse <dwmw2@xxxxxxxxxxxxx> wrote:
> On Tue, 2018-01-09 at 00:44 +0000, Woodhouse, David wrote:
>> On IRC, Arjan assures me that 'pause' here really is sufficient as a
>> speculation trap. If we do end up returning back here as a
>> misprediction, that 'pause' will stop the speculative execution on
>> affected CPUs even though it isn't *architecturally* documented to do
>> so.
>>
>> Arjan, can you confirm that in email please?
>
>
> That actually doesn't make sense to me. If 'pause' alone is sufficient,
> then why in $DEITY's name would we need a '1:pause;jmp 1b' loop in the
> retpoline itself?
>
> Arjan?

On further investigation, I don't understand any of the motivation for
the changes here:
- It micro-benchmarks several cycles slower than the suggested
implementation on average (38 vs 44 cycles) [likely due to lost 16-byte call
alignment]
- It's much larger in terms of .text size (120 bytes @ 16 calls, 218
bytes @ 30 calls) vs (61 bytes)
- I'm not sure it's universally correct in preventing speculation:

(1) I am able to observe a small timing difference between executing
"1: pause; jmp 1b;" and "pause" in the speculative path.
Given that alignment is otherwise identical, this should only
occur if execution is non-identical, which would require speculative
execution to proceed beyond the pause.
(2) When we proposed and reviewed the sequence. This was not cited by
architects as a way of presenting speculation. Indeed, as David
points out, we'd consider using this within the sequence without the
loop.


If the claim above is true -- which (1) actually appears to contradict
-- it seems to bear stronger validation. Particularly since that in
the suggested sequences we can fit the jmps within the space we get
for free by aligning the call targets.