Re: [RFC][PATCH 02/17] x86/cpu: Clean up SRSO return thunk mess

From: Peter Zijlstra
Date: Sat Aug 12 2023 - 07:21:37 EST


On Fri, Aug 11, 2023 at 10:00:31AM -0700, Nick Desaulniers wrote:
> On Fri, Aug 11, 2023 at 12:01 AM Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> >
> > On Thu, Aug 10, 2023 at 02:37:56PM +0200, Peter Zijlstra wrote:
> >
> > > After this patch things look equivalent to:
> > >
> > > SYM_FUNC_START(foo)
> > > ...
> > > ALTERNATIVE "ret; int3"
> > > "jmp __x86_return_thunk", X86_FEATURE_RETHUNK
> > > "jmp srso_return_thunk, X86_FEATURE_SRSO
> > > "jmp srsi_alias_return_thunk", X86_FEATURE_SRSO_ALIAS
> > > SYM_FUNC_END(foo)
> > >
> > > SYM_CODE_START(srso_return_thunk)
> > > UNWIND_HINT_FUNC
> > > ANNOTATE_NOENDBR
> > > call srso_safe_ret;
> > > ud2
> > > SYM_CODE_END(srso_return_thunk)
> > >
> > > SYM_CODE_START(srso_alias_return_thunk)
> > > UNWIND_HINT_FUNC
> > > ANNOTATE_NOENDBR
> > > call srso_alias_safe_ret;
> > > ud2
> > > SYM_CODE_END(srso_alias_return_thunk)
> > >
> >
> > So it looks like the compilers are still not emitting int3 after jmp,
> > even with the SLS options enabled :/
> >
> > This means the tail end of functions compiled with:
> >
> > -mharden-sls=all -mfunction-return=thunk-extern
> >
> > Is still a regular: jmp __x86_return_thunk, no trailing trap.
> >
> > https://godbolt.org/z/Ecqv76YbE
>
> I don't have time to finish this today, but
> https://reviews.llvm.org/D157734 should do what you're looking for, I
> think.

Hmm, so your wording seems to imply regular SLS would already emit INT3
after jump, but I'm not seeing that in clang-16 output. Should I upgrade
my llvm?

[[edit]] Oooh, now I see, regular SLS would emit RET; INT3, but what I'm
alluding to was that sls=all should also emit INT3 after every JMP due
to AMD BTC. This is an SLS option that seems to have gone missing in
both compilers for a long while.


And yesterday I only quickly looked at bigger gcc output and not clang.
But when I look at clang-16 output I see things like:

1053: 2e e8 00 00 00 00 cs call 1059 <yield_to+0xe9> 1055: R_X86_64_PLT32 __x86_indirect_thunk_r11-0x4
1059: 84 c0 test %al,%al
105b: 74 1c je 1079 <yield_to+0x109>
105d: eb 6e jmp 10cd <yield_to+0x15d>

No INT3

105f: 41 bc 01 00 00 00 mov $0x1,%r12d
1065: 80 7c 24 04 00 cmpb $0x0,0x4(%rsp)
106a: 74 0d je 1079 <yield_to+0x109>
106c: 4d 39 fe cmp %r15,%r14
106f: 74 08 je 1079 <yield_to+0x109>
1071: 4c 89 ff mov %r15,%rdi
1074: e8 00 00 00 00 call 1079 <yield_to+0x109> 1075: R_X86_64_PLT32 resched_curr-0x4
1079: 4d 39 fe cmp %r15,%r14
107c: 74 08 je 1086 <yield_to+0x116>
107e: 4c 89 ff mov %r15,%rdi
1081: e8 00 00 00 00 call 1086 <yield_to+0x116> 1082: R_X86_64_PLT32 _raw_spin_unlock-0x4
1086: 4c 89 f7 mov %r14,%rdi
1089: e8 00 00 00 00 call 108e <yield_to+0x11e> 108a: R_X86_64_PLT32 _raw_spin_unlock-0x4
108e: f7 c3 00 02 00 00 test $0x200,%ebx
1094: 74 06 je 109c <yield_to+0x12c>
1096: ff 15 00 00 00 00 call *0x0(%rip) # 109c <yield_to+0x12c> 1098: R_X86_64_PC32 pv_ops+0xfc
109c: 45 85 e4 test %r12d,%r12d
109f: 7e 05 jle 10a6 <yield_to+0x136>
10a1: e8 00 00 00 00 call 10a6 <yield_to+0x136> 10a2: R_X86_64_PLT32 schedule-0x4
10a6: 44 89 e0 mov %r12d,%eax
10a9: 48 83 c4 08 add $0x8,%rsp
10ad: 5b pop %rbx
10ae: 41 5c pop %r12
10b0: 41 5d pop %r13
10b2: 41 5e pop %r14
10b4: 41 5f pop %r15
10b6: 5d pop %rbp
10b7: 2e e9 00 00 00 00 cs jmp 10bd <yield_to+0x14d> 10b9: R_X86_64_PLT32 __x86_return_thunk-0x4

CS padding!!

10bd: 41 bc fd ff ff ff mov $0xfffffffd,%r12d
10c3: f7 c3 00 02 00 00 test $0x200,%ebx


So since you (surprisingly!) CS pad the return thunk, I *could* pull it
off there, 6 bytes is enough space to write: 'CALL foo; INT3'

But really SLS *should* put INT3 after every JMP instruction -- of
course including the return thunk one.