RE: [tip: x86/bugs] x86/retpoline: Ensure default return thunk isn't used at runtime

From: Kaplan, David
Date: Tue Oct 17 2023 - 00:33:48 EST


[AMD Official Use Only - General]

> -----Original Message-----
> From: Nathan Chancellor <nathan@xxxxxxxxxx>
> Sent: Monday, October 16, 2023 4:48 PM
> To: Borislav Petkov <bp@xxxxxxxxx>
> Cc: linux-kernel@xxxxxxxxxxxxxxx; linux-tip-commits@xxxxxxxxxxxxxxx;
> Kaplan, David <David.Kaplan@xxxxxxx>; Ingo Molnar <mingo@xxxxxxxxxx>;
> Josh Poimboeuf <jpoimboe@xxxxxxxxxx>; Peter Zijlstra (Intel)
> <peterz@xxxxxxxxxxxxx>; x86@xxxxxxxxxx; llvm@xxxxxxxxxxxxxxx
> Subject: Re: [tip: x86/bugs] x86/retpoline: Ensure default return thunk isn't
> used at runtime
>
> Caution: This message originated from an External Source. Use proper
> caution when opening attachments, clicking links, or responding.
>
>
> On Mon, Oct 16, 2023 at 11:29:44PM +0200, Borislav Petkov wrote:
> > On Mon, Oct 16, 2023 at 02:10:40PM -0700, Nathan Chancellor wrote:
> > > I just bisected a boot failure that our continuous integration sees
> > > [1] with x86_64_defconfig + CONFIG_KCSAN=y to this change in
> > > -tip/-next. It does not appear to be clang specific, as I can
> > > reproduce it with GCC
> > > 13.2.0 from kernel.org [2] (the rootfs is available at [3], if it is
> > > necessary for reproducing).
> > >
> > > $ make -skj"$(nproc)" ARCH=x86_64 CROSS_COMPILE=x86_64-linux-
> > > defconfig $ scripts/config -e KCSAN $ make -skj"$(nproc)"
> > > ARCH=x86_64 CROSS_COMPILE=x86_64-linux- olddefconfig bzImage $
> > > qemu-system-x86_64 \
> > > -display none \
> > > -nodefaults \
> > > -d unimp,guest_errors \
> > > -append 'console=ttyS0 earlycon=uart8250,io,0x3f8' \
> > > -kernel arch/x86/boot/bzImage \
> > > -initrd x86_64-rootfs.cpio \
> > > -cpu host \
> >

I think I found the problem, although I'm not sure the best way to fix it.

When KCSAN is enabled, GCC generates lots of constructor functions named _sub_I_00099_0 which call __tsan_init and then return. The returns in these are generally annotated normally by objtool and fixed up at runtime. But objtool runs on vmlinux.o and vmlinux.o does not include a couple of object files that are in vmlinux, like init/version-timestamp.o and .vmlinux.export.o, both of which contain _sub_I_00099_0 functions. As a result, the returns in these functions are not annotated, and the panic occurs when we call one of them in do_ctors and it uses the default return thunk.

This difference can be seen by counting the number of these functions in the object files:
$ objdump -d vmlinux.o|grep -c "<_sub_I_00099_0>:"
2601
$ objdump -d vmlinux|grep -c "<_sub_I_00099_0>:"
2603

If these functions are only run during kernel boot, there is no speculation concern. My first thought is that these two object files perhaps should be built without -mfunction-return=thunk-extern. The use of that flag requires objtool to have the intended behavior and objtool isn't seeing these files.

Perhaps another option would be to not compile these two files with KCSAN, as they are already excluded from KASAN and GCOV it looks like.

--David Kaplan