Re: [PATCH 2/2] x86/retpoline,kprobes: Avoid treating rethunk as an indirect jump

From: Peter Zijlstra
Date: Thu Jul 06 2023 - 07:34:29 EST


On Thu, Jul 06, 2023 at 06:00:14PM +0900, Masami Hiramatsu wrote:
> On Thu, 6 Jul 2023 09:17:05 +0200
> Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
>
> > On Thu, Jul 06, 2023 at 09:47:23AM +0900, Masami Hiramatsu wrote:
> >
> > > > > If I understand correctly, all indirect jump will be replaced with JMP_NOSPEC.
> > > > > If you read the insn_jump_into_range, I onlu jecks the jump code, not call.
> > > > > So the functions only have indirect call still allow optprobe.
> > > >
> > > > With the introduction of kCFI JMP_NOSPEC is no longer an equivalent to a
> > > > C indirect jump.
> > >
> > > If I understand correctly, kCFI is enabled by CFI_CLANG, and clang is not
> > > using jump-tables by default, so we can focus on gcc. In that case
> > > current check still work, correct?
> >
> > IIRC clang can use jump tables, but like GCC needs RETPOLINE=n and
> > IBT=n, so effectively nobody has them.
>
> So if it requires RETPOLINE=n, current __indirect_thunk_start/end checking
> is not required, right? (that code is embraced with "#ifdef CONFIG_RETPOLINE")

Correct.

> >
> > The reason I did mention kCFI though is that kCFI has a larger 'indirect
> > jump' sequence, and I'm not sure we've thought about what can go
> > sideways if that's optprobed.
>
> If I understand correctly, kCFI checks only indirect function call (check
> pointer), so no jump tables. Or does it use indirect 'jump' ?

Yes, it's indirect function calls only.

Imagine our function (bar) doing an indirect call, it will (as clang
always does) have the function pointer in r11:

bar:
...
movl $(-0x12345678),%r10d
addl -15(%r11), %r10d
je 1f
ud2
1: call __x86_indirect_thunk_r11



And then the function it calls (foo) looks like:

__cfi_foo:
movl $0x12345678, %eax
.skip 11, 0x90
foo:
endbr
....



So if the caller (in bar) and the callee (foo) have the same hash value
(0x12345678 in this case) then it will be equal and we continue on our
merry way.

However, if they do not match, we'll trip that #UD and the
handle_cfi_failure() will try and match the address to
__{start,stop}__kcfi_traps[]. Additinoally decode_cfi_insn() will try
and decode that whole call sequence in order to obtain the target
address and typeid (hash).

optprobes might disturb this code.

> > I suspect the UD2 that's in there will go 'funny' if it's relocated into
> > an optprobe, as in, it'll not be recognised as a CFI fail.
>
> UD2 can't be optprobed (kprobe neither) because it can change the dumped
> BUG address...

Right, same problem here. But could the movl/addl be opt-probed? That
would wreck decode_cfi_insn(). Then again, if decode_cfi_insn() fails,
we'll get report_cfi_failure_noaddr(), which is less informative.

So it looks like nothing too horrible happens...