Re: [tip: x86/bugs] x86/retpoline: Ensure default return thunk isn't used at runtime

From: Josh Poimboeuf
Date: Wed Oct 18 2023 - 14:14:38 EST


On Wed, Oct 18, 2023 at 07:55:31PM +0200, Borislav Petkov wrote:
> On Wed, Oct 18, 2023 at 08:54:33AM -0700, Josh Poimboeuf wrote:
> > On Wed, Oct 18, 2023 at 05:12:45PM +0200, Borislav Petkov wrote:
> > > On Wed, Oct 18, 2023 at 03:38:56PM +0200, Ingo Molnar wrote:
> > > > If then WARN_ONCE().
> > >
> > > WARN_ONCE() is not enough considering that if this fires, it means we're
> > > not really properly protected against one of those RET-speculation
> > > things.
> > >
> > > It needs to be warning constantly but then still allow booting. I.e,
> > > a ratelimited warn of sorts but I don't think we have that... yet.
> >
> > I'm not sure a rate-limited WARN() would be a good thing. Either the
> > user is regularly checking dmesg (most likely in some automated fashion)
> > or they're not. If the latter, a rate-limited WARN() would wrap dmesg
> > pretty quickly.
>
> Well, freezing the box without any mention about why it happens is not
> viable either. So for lack of a better solution, overflowing dmesg is
> all we could do.

Why not just WARN_ONCE() then?

> And, on a related note, I'm thinking I should revert:
>
> e92626af3234 ("x86/retpoline: Remove .text..__x86.return_thunk section")
>
> after all because I'm debugging another similar issue reported by
> dhowells.
>
> And I can reproduce it on linux-next with his config and gcc-13. The
> splat looks like this below - and mind you, that's in a VM. On baremetal
> you get to see only the first warning and output stops.
>
> And that happens because for whatever reason apply_returns() can't find
> that last jmp __x86_return_thunk for %r15 and it barfs.
>
> When I revert e92626af3234, it is fixed. It fixes dhowells' box too.
>
> Which means, IMHO, objtool is missing to add a return return call site
> at the end of that __x86_indirect_thunk_r15.
>
> And considering how close we are to the merge window, I'd let that
> .text..__x86.return_thunk section exist so that objtool can find the
> return sites more reliably that what we currently have.
>
> We can always do e92626af3234 later, when it has seen more testing.

Ok. A revert is fine for now, but either way we do need to get to the
bottom of why objtool is messing up. Can you share the config?

--
Josh