Re: [PATCH] x86/retpoline: Avoid return buffer underflows on context switch

From: David Woodhouse
Date: Mon Jan 08 2018 - 22:06:06 EST


On Mon, 2018-01-08 at 18:48 -0800, Paul Turner wrote:
> On Mon, Jan 8, 2018 at 4:48 PM, David Woodhouse <dwmw2@xxxxxxxxxxxxx> wrote:
> >
> > On Tue, 2018-01-09 at 00:44 +0000, Woodhouse, David wrote:
> > >
> > > On IRC, Arjan assures me that 'pause' here really is sufficient as a
> > > speculation trap. If we do end up returning back here as a
> > > misprediction, that 'pause' will stop the speculative execution on
> > > affected CPUs even though it isn't *architecturally* documented to do
> > > so.
> > >
> > > Arjan, can you confirm that in email please?
> >
> > That actually doesn't make sense to me. If 'pause' alone is sufficient,
> > then why in $DEITY's name would we need a '1:pause;jmp 1b' loop in the
> > retpoline itself?
> >
> > Arjan?
> On further investigation, I don't understand any of the motivation for
> the changes here:
> - It micro-benchmarks several cycles slower than the suggested
> implementation on average (38 vs 44 cycles) [likely due to lost 16-byte call
> alignment]
> - It's much larger in terms of .text size (120 bytes @ 16 calls, 218
> bytes @ 30 calls) vs (61 bytes)
> - I'm not sure it's universally correct in preventing speculation:
>
> (1) I am able to observe a small timing difference between executing
> "1: pause; jmp 1b;" and "pause" in the speculative path.
> ÂÂÂÂÂGiven that alignment is otherwise identical, this should only
> occur if execution is non-identical, which would require speculative
> execution to proceed beyond the pause.
> (2) When we proposed and reviewed the sequence.ÂÂThis was not cited by
> architects as a way of presenting speculation.ÂÂIndeed, as David
> points out, we'd consider using this within the sequence without the
> loop.
>
>
> If the claim above is true -- which (1) actually appears to contradict
> -- it seems to bear stronger validation.ÂÂParticularly since that in
> the suggested sequences we can fit the jmps within the space we get
> for free by aligning the call targets.

Some of the discrepancies are because it's been filtered through Intel
and I may not have had your latest version.

I'm going to revert to your version from
https://support.google.com/faqs/answer/7625886Ââ; for the retpoline
(i.e. adding back the alignment) and the RSB stuffing.

If Intel have a sequence which is simpler and guaranteed to work, I'll
let them post that with an authoritative statement from the CPU
architects. At this point, we really need to get on with rolling in the
other parts on top.

Attachment: smime.p7s
Description: S/MIME cryptographic signature