Re: [PATCH] x86/retpoline: Avoid return buffer underflows on context switch

From: Paul Turner
Date: Mon Jan 08 2018 - 21:49:26 EST

Next message: Martin K. Petersen: "Re: [PATCH][scsi-next] scsi: aacraid: remove redundant setting of variable c"
Previous message: lipeng (Y): "Re: [PATCH net-next 12/20] net: hns3: Add packet statistics of netdev"
In reply to: David Woodhouse: "Re: [PATCH] x86/retpoline: Avoid return buffer underflows on context switch"
Next in thread: David Woodhouse: "Re: [PATCH] x86/retpoline: Avoid return buffer underflows on context switch"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Mon, Jan 8, 2018 at 4:48 PM, David Woodhouse <dwmw2@xxxxxxxxxxxxx> wrote:
> On Tue, 2018-01-09 at 00:44 +0000, Woodhouse, David wrote:
>> On IRC, Arjan assures me that 'pause' here really is sufficient as a
>> speculation trap. If we do end up returning back here as a
>> misprediction, that 'pause' will stop the speculative execution on
>> affected CPUs even though it isn't *architecturally* documented to do
>> so.
>>
>> Arjan, can you confirm that in email please?
>
>
> That actually doesn't make sense to me. If 'pause' alone is sufficient,
> then why in $DEITY's name would we need a '1:pause;jmp 1b' loop in the
> retpoline itself?
>
> Arjan?

On further investigation, I don't understand any of the motivation for
the changes here:
- It micro-benchmarks several cycles slower than the suggested
implementation on average (38 vs 44 cycles) [likely due to lost 16-byte call
alignment]
- It's much larger in terms of .text size (120 bytes @ 16 calls, 218
bytes @ 30 calls) vs (61 bytes)
- I'm not sure it's universally correct in preventing speculation:

(1) I am able to observe a small timing difference between executing
"1: pause; jmp 1b;" and "pause" in the speculative path.
Given that alignment is otherwise identical, this should only
occur if execution is non-identical, which would require speculative
execution to proceed beyond the pause.
(2) When we proposed and reviewed the sequence. This was not cited by
architects as a way of presenting speculation. Indeed, as David
points out, we'd consider using this within the sequence without the
loop.

If the claim above is true -- which (1) actually appears to contradict
-- it seems to bear stronger validation. Particularly since that in
the suggested sequences we can fit the jmps within the space we get
for free by aligning the call targets.

Next message: Martin K. Petersen: "Re: [PATCH][scsi-next] scsi: aacraid: remove redundant setting of variable c"
Previous message: lipeng (Y): "Re: [PATCH net-next 12/20] net: hns3: Add packet statistics of netdev"
In reply to: David Woodhouse: "Re: [PATCH] x86/retpoline: Avoid return buffer underflows on context switch"
Next in thread: David Woodhouse: "Re: [PATCH] x86/retpoline: Avoid return buffer underflows on context switch"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]