Re: printk.time causes rare kernel boot hangs

From: Peter Zijlstra
Date: Wed Jun 14 2023 - 08:53:37 EST


On Wed, Jun 14, 2023 at 01:35:36PM +0200, Peter Zijlstra wrote:
> On Wed, Jun 14, 2023 at 11:39:53AM +0100, Richard W.M. Jones wrote:
> > Got it!
> >
> > #0 arch_static_branch (branch=false, key=<optimized out>)
> > at ./arch/x86/include/asm/jump_label.h:27
> > #1 static_key_false (key=<optimized out>) at ./include/linux/jump_label.h:207
> > #2 native_write_msr (msr=1760, low=1876580734, high=106)
> > at ./arch/x86/include/asm/msr.h:147
> > #3 0xffffffff8107997c in paravirt_write_msr (high=<optimized out>,
> > low=1876580734, msr=1760) at ./arch/x86/include/asm/paravirt.h:196
> > #4 wrmsrl (val=<optimized out>, msr=1760)
> > at ./arch/x86/include/asm/paravirt.h:229
> > #5 lapic_next_deadline (delta=<optimized out>, evt=<optimized out>)
> > at arch/x86/kernel/apic/apic.c:491
> > #6 0xffffffff811f7b1d in clockevents_program_event (dev=0xffff88804e820dc0,
> > expires=<optimized out>, force=<optimized out>)
> > at kernel/time/clockevents.c:334
> > #7 0xffffffff811f81b0 in tick_handle_periodic (dev=0xffff88804e820dc0)
> > at kernel/time/tick-common.c:133
> > #8 0xffffffff810796c1 in local_apic_timer_interrupt ()
> > at arch/x86/kernel/apic/apic.c:1095
> > #9 __sysvec_apic_timer_interrupt (regs=regs@entry=0xffffc90000003ee8)
> > at arch/x86/kernel/apic/apic.c:1112
> > #10 0xffffffff81f9cf09 in sysvec_apic_timer_interrupt (regs=0xffffc90000003ee8)
> > at arch/x86/kernel/apic/apic.c:1106
> > #11 0xffffffff820015ca in asm_sysvec_apic_timer_interrupt ()
> > at ./arch/x86/include/asm/idtentry.h:645
> > #12 0x0000000000000000 in ?? ()
>
> Uuuhh.. something is really fishy here. The thing in common between the
> fingered patch and this stacktrace is the jump_label/static_branch
> usage, but they're quite different paths.
>
> There is no printk or local_clock() in sight here.
>
> I've got that plain qemu thing running on:
>
> defconfig + kvm_guest.config + CONFIG_DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT=y
>
> and have added "nokaslr" to the -append string. Lets see if it wants to
> go wobbly for me.


Ooooh, what qemu version do you have? There were some really dodgy
reports all around self modifying code, all reported on 7.2, that seems
to have gone away with 8.

Now, all of them were using TCG, and I think you're using KVM.

I've at least 36000 cycles and still nothing :-(, let me go try your
.config.