Re: printk.time causes rare kernel boot hangs

From: Richard W.M. Jones
Date: Thu Jun 15 2023 - 07:30:13 EST


On Thu, Jun 15, 2023 at 11:04:29AM +0000, YiFei Zhu wrote:
> > FWIW attached is a test program that runs the qemu instances in
> > parallel (up to 8 threads), which seems to be a quicker way to hit the
> > problem for me. Even on Intel, with this test I can hit the bug in a
> > few hundred iteration.
>
> A friend sent me here so I took a look.
>
> I was unable to reproduce with this script after 10000 iterations,
> on a AMD Gentoo Linux host:
>
> Host kernel: 6.3.3 vanilla
> Guest kernel: git commit f31dcb152a3d0816e2f1deab4e64572336da197d
> Guest config: Provided full-fat Fedora config + CONFIG_GDB_SCRIPTS
> QEMU: 8.0.2 (with kvm_amd)
> Hardware: AMD Ryzen 7 PRO 5850U
>
> I wonder if anything on the host side affects this, or could be some
> sort of race condition.

We've had multiple independent reports of reproducing the bug, since
this story (unfortunately) hit Hacker News. Your configuration above
should work, so I still don't know what the factor is.

[...]

> If you can reproduce the original bug (without the msleep or busy wait
> patch), could you check if you can reproduce that with idle=poll? If so,
> can you run "p show_state_filter(0)" so we get a stack trace of kernel_init,
> assuming it hit a similar issue as if msleep was added. If idle=poll does
> not work, or you can't call functions from within gdb (some old qemu versions
> did not support this), see if you can send a alt-sysrq-w to show stacks of
> blocked tasks.

(1) Adding idle=poll to the guest kernel

=> Bug still occurs, with about the same frequency as before.

(2) Connect with gdb to qemu's gdb-stub:

Trying to evaluate show_state_filter(0) didn't work for reasons I
don't understand:

(gdb) target remote localhost:1234
Remote debugging using localhost:1234
warning: Remote gdbserver does not support determining executable automatically.
RHEL <=6.8 and <=7.2 versions of gdbserver do not support such automatic execut.
The following versions of gdbserver support it:
- Upstream version of gdbserver (unsupported) 7.10 or later
- Red Hat Developer Toolset (DTS) version of gdbserver from DTS 4.0 or later (o)
- RHEL-7.3 versions of gdbserver (on any architecture)
arch_static_branch (branch=false, key=<optimized out>)
at ./arch/x86/include/asm/jump_label.h:27
27 asm_volatile_goto("1:"
(gdb) bt
#0 arch_static_branch (branch=false, key=<optimized out>)
at ./arch/x86/include/asm/jump_label.h:27
#1 static_key_false (key=<optimized out>) at ./include/linux/jump_label.h:207
#2 native_write_msr (high=222, low=719927812, msr=1760)
at ./arch/x86/include/asm/msr.h:147
#3 wrmsrl (val=954202667524, msr=1760) at ./arch/x86/include/asm/msr.h:262
#4 lapic_next_deadline (delta=474, evt=0xffff88804e81bf40)
at arch/x86/kernel/apic/apic.c:491
#5 0xffffffff81143667 in clockevents_program_event (dev=0xffff88804e81bf40,
expires=<optimized out>, force=<optimized out>)
at kernel/time/clockevents.c:334
#6 0xffffffff81143c0b in tick_handle_periodic (dev=0xffff88804e81bf40)
at kernel/time/tick-common.c:133
#7 0xffffffff8105d01c in local_apic_timer_interrupt ()
at arch/x86/kernel/apic/apic.c:1095
#8 __sysvec_apic_timer_interrupt (regs=regs@entry=0xffffc90000003ee8)
at arch/x86/kernel/apic/apic.c:1112
#9 0xffffffff81e61a91 in sysvec_apic_timer_interrupt (regs=0xffffc90000003ee8)
at arch/x86/kernel/apic/apic.c:1106
#10 0xffffffff8200144a in asm_sysvec_apic_timer_interrupt ()
at ./arch/x86/include/asm/idtentry.h:645
#11 0x0000000000000000 in ?? ()
(gdb) p show_state_filter(0)
[Inferior 1 (process 1) exited normally]
The program being debugged exited while in a function called from GDB.
Evaluation of the expression containing the function
(show_state_filter) will be abandoned.

Rich.

--
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-top is 'top' for virtual machines. Tiny program with many
powerful monitoring features, net stats, disk stats, logging, etc.
http://people.redhat.com/~rjones/virt-top