Re: [linus:master] [x86/sme] 48204aba80: BUG:kernel_failed_in_early-boot_stage,last_printk:Booting_the_kernel(entry_offset:#)

From: Oliver Sang
Date: Thu Mar 28 2024 - 01:57:30 EST


hi, Ard Biesheuvel,

On Tue, Mar 26, 2024 at 10:59:04AM +0200, Ard Biesheuvel wrote:
> On Tue, 26 Mar 2024 at 10:31, Oliver Sang <oliver.sang@xxxxxxxxx> wrote:
> >
> > hi, Ard Biesheuvel,
> >
> > On Mon, Mar 25, 2024 at 04:39:26PM +0200, Ard Biesheuvel wrote:
> > > On Sun, 24 Mar 2024 at 16:26, Borislav Petkov <bp@xxxxxxxxx> wrote:
> > > >
> > > > On Fri, Mar 22, 2024 at 05:03:18PM +0800, kernel test robot wrote:
> > > > >
> > > > >
> > > > > Hello,
> > > > >
> > > > > kernel test robot noticed "BUG:kernel_failed_in_early-boot_stage,last_printk:Booting_the_kernel(entry_offset:#)" on:
> > > > >
> > > > > commit: 48204aba801f1b512b3abed10b8e1a63e03f3dd1 ("x86/sme: Move early SME kernel encryption handling into .head.text")
> > > > > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
> > > > >
> > > > > [test failed on linus/master 741e9d668aa50c91e4f681511ce0e408d55dd7ce]
> > > > > [test failed on linux-next/master a1e7655b77e3391b58ac28256789ea45b1685abb]
> > > > >
> > > > > in testcase: boot
> > > > >
> > > > > compiler: gcc-12
> > > > > test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G
> > > >
> > > > My guest boots with your .config and SNB as CPU model:
> > > >
> > > > ...
> > > > [ 0.373770][ T1] smpboot: CPU0: Intel Xeon E312xx (Sandy Bridge) (family: 0x6, model: 0x2a, stepping: 0x1)
> > > >
> > > > Artefacts like:
> > > >
> > > > -initrd initrd-vm-meta-180.cgz
> > > >
> > > > or
> > > >
> > > > RESULT_ROOT=/result/boot/1/vm-snb/quantal-x86_64-core-20190426.cgz/x86_64-rhel-8.3-bpf/gcc-12/48204aba801f1b512b3abed10b8e1a63e03f3dd1/3
> > > >
> > > > I don't have and don't know how to generate here so I can't run your
> > > > exact reproducer.
> > > >
> > >
> > > I ran the reproducer using the instructions, and things seem to work fine.
> > >
> > > https://paste.debian.net/1311951/
> > >
> > > Could you provide any information regarding the QEMU version and its
> > > BIOS implementation?
> >
> > for QEMU version:
> >
> > $ qemu-system-x86_64 --version
> > QEMU emulator version 7.2.9 (Debian 1:7.2+dfsg-7+deb12u5)
> > Copyright (c) 2003-2022 Fabrice Bellard and the QEMU Project developers
> >
>
> I tested the exact same version.
>
> Does it reproduce with -cpu host instead of -cpu SandyBridge? When
> running under KVM, I suspect emulating the actual host uarch rather
> than setting a different one is a more reliable strategy. What CPU
> type does the host have?


we have a machine pool which has machines with different cpu models, we deploy
vm on them to run various boot/fuzzy/func tests. to avoid subtle issues, we
couldn't use '-cpu host' directly.


>
> >
> > for BIOS:
> >
> > We don't specify bios option for qemu, my understanding is we just run with
> > default bios for qemu (the seabios). Extra info of seabios
> >
>
> Today, legacy BIOS boot is only used by a minority of x86 systems in
> the field, so for better coverage, it would make sense to at least
> start testing UEFI as well.
>
> On debian, just install the ovmf package, and pass -bios
> /usr/share/ovmf/OVMF.fd on the QEMU command line.
>
> And given that you are doing virt based boot testing, another very
> important use case is TDX boot (as well as SEV-SNP, but that may be
> more difficult for you to organize). But please explore internally at
> Intel whether TDX can be added to your test matrix as well.

thanks a lot for great suggestions! we will investigate these.


regarding this early-boot failure issue, by more tests, we double it may
relate with 3 configs. as we shared in [1], they are set as below when the
kernel run into early-boot failure:

# CONFIG_INIT_STACK_NONE is not set
CONFIG_INIT_STACK_ALL_ZERO=y
CONFIG_GCC_PLUGIN_STACKLEAK=y


the early-boot failure issue will _disappear_ by making either one of two
changes:

(1)
CONFIG_CC_HAS_AUTO_VAR_INIT_ZERO=y
-# CONFIG_INIT_STACK_NONE is not set
+CONFIG_INIT_STACK_NONE=y
# CONFIG_INIT_STACK_ALL_PATTERN is not set
-CONFIG_INIT_STACK_ALL_ZERO=y
+# CONFIG_INIT_STACK_ALL_ZERO is not set
CONFIG_GCC_PLUGIN_STACKLEAK=y


(2)
CONFIG_INIT_STACK_ALL_ZERO=y
-CONFIG_GCC_PLUGIN_STACKLEAK=y
-# CONFIG_GCC_PLUGIN_STACKLEAK_VERBOSE is not set
-CONFIG_STACKLEAK_TRACK_MIN_SIZE=100
-# CONFIG_STACKLEAK_METRICS is not set
-# CONFIG_STACKLEAK_RUNTIME_DISABLE is not set
+# CONFIG_GCC_PLUGIN_STACKLEAK is not set
CONFIG_INIT_ON_ALLOC_DEFAULT_ON=y


[1]
https://download.01.org/0day-ci/archive/20240322/202403221630.2692c998-oliver.sang@xxxxxxxxx/config-6.8.0-rc6-00057-g48204aba801f