Re: [PATCH RFC 4/4] x86/srso: Use CALL-based return thunks to reduce overhead

From: Andrew Cooper
Date: Mon Aug 21 2023 - 19:01:44 EST

Next message: Dmitry Baryshkov: "Re: [PATCH 5/9] i2c: qcom-cci: Use dev_err_probe in probe function"
Previous message: Steve Rutherford: "[PATCH v2] x86/sev: Make enc_dec_hypercall() accept a size instead of npages"
In reply to: Josh Poimboeuf: "Re: [PATCH RFC 4/4] x86/srso: Use CALL-based return thunks to reduce overhead"
Next in thread: Josh Poimboeuf: "Re: [PATCH RFC 4/4] x86/srso: Use CALL-based return thunks to reduce overhead"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 21/08/2023 4:16 pm, Josh Poimboeuf wrote:
> On Mon, Aug 21, 2023 at 12:27:23PM +0100, Andrew Cooper wrote:
>> The SRSO safety depends on having a CALL to an {ADD,LEA}/RET sequence which
>> has been made safe in the BTB. Specifically, there needs to be no pertubance
>> to the RAS between a correctly predicted CALL and the subsequent RET.
>>
>> Use the new infrastructure to CALL to a return thunk. Remove
>> srso_fam1?_safe_ret() symbols and point srso_fam1?_return_thunk().
>>
>> This removes one taken branch from every function return, which will reduce
>> the overhead of the mitigation. It also removes one of three moving pieces
>> from the SRSO mess.
> So, the address of whatever instruction comes after the 'CALL
> srso_*_return_thunk' is added to the RSB/RAS, and that might be
> speculated to when the thunk returns. Is that a concern?

That is very intentional, and key to the safety.

Replacing a RET with a CALL/{ADD,LEA}/RET sequence is a form of
retpoline thunk. The only difference with regular retpolines is that
the intended target is already on the stack, and not in a GPR.

If the CALL mispredicts, it doesn't matter. When decode catches up
(allegedly either instantaneously on Fam19h, or a few cycles late on
Fam17h), the top of the RAS is corrected will point at the INT3
following the CALL instruction.

When the CALL is corrected, speculation continues at the real
destination (the {ADD,LEA}/RET sequence) where the {ADD,LEA} pops the
"wrong" return address off the stack and lets the RET take the next
address up the stack.

The RET predicts to INT3 following the call (which is safe), and
eventually gets corrected to the parent return address on the stack
which is the real intended destination.

Therefore, rogue RET speculation is always safely contained at the INT3
until the RET uop can execute, notice the mispredict, and correct to
what the stack says is correct.

Of course, relying on the fact that the {ADD,LEA}+RET sequence doesn't
have poison in the BTB, which is what the UNTRAIN_RET sequence is trying
to achieve with microarchitectural means.

>> Signed-off-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
>> ---
>> CC: x86@xxxxxxxxxx
>> CC: linux-kernel@xxxxxxxxxxxxxxx
>> CC: Borislav Petkov <bp@xxxxxxxxx>
>> CC: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
>> CC: Josh Poimboeuf <jpoimboe@xxxxxxxxxx>
>> CC: Babu Moger <babu.moger@xxxxxxx>
>> CC: David.Kaplan@xxxxxxx
>> CC: Nikolay Borisov <nik.borisov@xxxxxxxx>
>> CC: gregkh@xxxxxxxxxxxxxxxxxxx
>> CC: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
>>
>> RFC:
>>
>> vmlinux.o: warning: objtool: srso_fam17_return_thunk(): can't find starting instruction
>>
>> Any objtool whisperers know what's going on, and particularly why
>> srso_fam19_return_thunk() appears to be happy?
>>
>> Also, depends on the resolution of the RFC in the previous patch.
> I can take a look.

Thanks.

~Andrew

Next message: Dmitry Baryshkov: "Re: [PATCH 5/9] i2c: qcom-cci: Use dev_err_probe in probe function"
Previous message: Steve Rutherford: "[PATCH v2] x86/sev: Make enc_dec_hypercall() accept a size instead of npages"
In reply to: Josh Poimboeuf: "Re: [PATCH RFC 4/4] x86/srso: Use CALL-based return thunks to reduce overhead"
Next in thread: Josh Poimboeuf: "Re: [PATCH RFC 4/4] x86/srso: Use CALL-based return thunks to reduce overhead"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]