Re: [PATCH 2/2] x86/retpoline,kprobes: Avoid treating rethunk as an indirect jump

From: Google
Date: Fri Jul 07 2023 - 10:39:27 EST


On Thu, 6 Jul 2023 13:34:03 +0200
Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:

> On Thu, Jul 06, 2023 at 06:00:14PM +0900, Masami Hiramatsu wrote:
> > On Thu, 6 Jul 2023 09:17:05 +0200
> > Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> >
> > > On Thu, Jul 06, 2023 at 09:47:23AM +0900, Masami Hiramatsu wrote:
> > >
> > > > > > If I understand correctly, all indirect jump will be replaced with JMP_NOSPEC.
> > > > > > If you read the insn_jump_into_range, I onlu jecks the jump code, not call.
> > > > > > So the functions only have indirect call still allow optprobe.
> > > > >
> > > > > With the introduction of kCFI JMP_NOSPEC is no longer an equivalent to a
> > > > > C indirect jump.
> > > >
> > > > If I understand correctly, kCFI is enabled by CFI_CLANG, and clang is not
> > > > using jump-tables by default, so we can focus on gcc. In that case
> > > > current check still work, correct?
> > >
> > > IIRC clang can use jump tables, but like GCC needs RETPOLINE=n and
> > > IBT=n, so effectively nobody has them.
> >
> > So if it requires RETPOLINE=n, current __indirect_thunk_start/end checking
> > is not required, right? (that code is embraced with "#ifdef CONFIG_RETPOLINE")
>
> Correct.
>
> > >
> > > The reason I did mention kCFI though is that kCFI has a larger 'indirect
> > > jump' sequence, and I'm not sure we've thought about what can go
> > > sideways if that's optprobed.
> >
> > If I understand correctly, kCFI checks only indirect function call (check
> > pointer), so no jump tables. Or does it use indirect 'jump' ?
>
> Yes, it's indirect function calls only.
>
> Imagine our function (bar) doing an indirect call, it will (as clang
> always does) have the function pointer in r11:
>
> bar:
> ...
> movl $(-0x12345678),%r10d
> addl -15(%r11), %r10d
> je 1f
> ud2
> 1: call __x86_indirect_thunk_r11
>
>
>
> And then the function it calls (foo) looks like:
>
> __cfi_foo:
> movl $0x12345678, %eax
> .skip 11, 0x90
> foo:
> endbr
> ....
>
>
>
> So if the caller (in bar) and the callee (foo) have the same hash value
> (0x12345678 in this case) then it will be equal and we continue on our
> merry way.
>
> However, if they do not match, we'll trip that #UD and the
> handle_cfi_failure() will try and match the address to
> __{start,stop}__kcfi_traps[]. Additinoally decode_cfi_insn() will try
> and decode that whole call sequence in order to obtain the target
> address and typeid (hash).

Thank you for the explanation! This helps me!

>
> optprobes might disturb this code.

So either optprobe or kprobes (any text instrumentation) do not touch
__cfi_FUNC symbols light before FUNC.

>
> > > I suspect the UD2 that's in there will go 'funny' if it's relocated into
> > > an optprobe, as in, it'll not be recognised as a CFI fail.
> >
> > UD2 can't be optprobed (kprobe neither) because it can change the dumped
> > BUG address...
>
> Right, same problem here. But could the movl/addl be opt-probed? That
> would wreck decode_cfi_insn(). Then again, if decode_cfi_insn() fails,
> we'll get report_cfi_failure_noaddr(), which is less informative.

Ok, so if that sequence is always expected, I can also prohibit probing it.
Or, maybe it is better to generalize the API to access original instruction
which is used from kprobes, so that decode_cfi_insn() can get the original
(non-probed) insn.

>
> So it looks like nothing too horrible happens...
>
>


Thank you,

--
Masami Hiramatsu (Google) <mhiramat@xxxxxxxxxx>