Re: kvm guest softlockup

From: Wanpeng Li
Date: Sun Sep 04 2016 - 21:24:09 EST


2016-09-01 20:13 GMT+08:00 Wanpeng Li <kernellwp@xxxxxxxxx>:
> I observed that full dynticks kvm guest(w/o
> CONFIG_IRQ_TIME_ACCOUNTING) softlockup after host machine
> suspend/resume, and always CPU0 stuck.
>
> [ 186.311397] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 65s!
> [qemu-system-x86:2138]
> [ 186.313497] Modules linked in: kvm_intel kvm irqbypass
> snd_hda_codec_realtek snd_hda_codec_generic joydev snd_hda_intel
> hid_generic snd_hda_codec snd_hda_core snd_hwdep snd_p
> cm usbhid snd_seq_midi hid snd_rawmidi snd_seq_midi_event snd_seq
> psmouse pvpanic snd_seq_device serio_raw snd_timer snd i2c_piix4
> floppy
> [ 186.313513] irq event stamp: 2130266
> [ 186.313513] hardirqs last enabled at (2130265):
> [<ffffffff948c716c>] _raw_spin_unlock_irq+0x2c/0x50
> [ 186.313519] hardirqs last disabled at (2130266):
> [<ffffffffc046a3d2>] kvm_arch_vcpu_ioctl_run+0xa42/0x1aa0 [kvm]
> [ 186.313552] softirqs last enabled at (2129884):
> [<ffffffff948ca98a>] __do_softirq+0x33a/0x4a0
> [ 186.313554] softirqs last disabled at (2129877):
> [<ffffffff94094995>] irq_exit+0xd5/0xf0
> [ 186.313558] CPU: 0 PID: 2138 Comm: qemu-system-x86 Not tainted 4.8.0-rc4+ #30
> [ 186.313559] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
> BIOS Bochs 01/01/2011
> [ 186.313559] task: ffff8ad1f0475040 task.stack: ffff8ad1edec4000
> [ 186.313560] RIP: 0010:[<ffffffffc0512417>] [<ffffffffc0512417>]
> vmx_handle_external_intr+0x57/0x60 [kvm_intel]
> [ 186.313578] RSP: 0018:ffff8ad1edec7cf8 EFLAGS: 00000086
> [ 186.313579] RAX: ffff8ad1edec7cf8 RBX: ffff8ad1edea0100 RCX: ffffffff948c94a0
> [ 186.313579] RDX: ffffffff00000000 RSI: 0000000000000734 RDI: ffff8ad1edee8000
> [ 186.313580] RBP: ffff8ad1edec7cf8 R08: 0000000000000710 R09: 0000000000000710
> [ 186.313580] R10: ffff8ad1ed240724 R11: 0000000000000000 R12: 0000000000000000
> [ 186.313581] R13: 0000000000000000 R14: ffff8ad1ee035a00 R15: ffff8ad1edee8000
> [ 186.313582] FS: 00007fca4ac63700(0000) GS:ffff8ad1f5600000(0000)
> knlGS:0000000000000000
> [ 186.313583] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 186.313584] CR2: 00000000b7683200 CR3: 000000016dc9f000 CR4: 00000000001426f0
> [ 186.313587] Stack:
> [ 186.313587] ffff8ad1edec7db8 ffffffffc046a74f ffffffffc0469a1a
> ffff8ad1edec7d90
> [ 186.313590] ffff8ad1f0475040 ffff8ad1f0475040 ffff8ad1f0475040
> ffff8ad1edec8000
> [ 186.313592] 0000000000000000 ffff8ad1edea0278 ffff8ad1edea0100
> ffff8ad1edee8000
> [ 186.313594] Call Trace:
> [ 186.313603] [<ffffffffc046a74f>] kvm_arch_vcpu_ioctl_run+0xdbf/0x1aa0 [kvm]
> [ 186.313611] [<ffffffffc0469a1a>] ? kvm_arch_vcpu_ioctl_run+0x8a/0x1aa0 [kvm]
> [ 186.313618] [<ffffffffc044d3c3>] kvm_vcpu_ioctl+0x2d3/0x7c0 [kvm]
> [ 186.313620] [<ffffffff9428230d>] ? __fget+0xfd/0x210
> [ 186.313622] [<ffffffff940e8f44>] ? __lock_is_held+0x54/0x70
> [ 186.313624] [<ffffffff942755c6>] do_vfs_ioctl+0x96/0x6a0
> [ 186.313625] [<ffffffff9428232c>] ? __fget+0x11c/0x210
> [ 186.313627] [<ffffffff94282215>] ? __fget+0x5/0x210
> [ 186.313628] [<ffffffff94275c49>] SyS_ioctl+0x79/0x90
> [ 186.313630] [<ffffffff94003ba1>] do_syscall_64+0x81/0x220
>
> cat /proc/stat | grep cpu in guest:
>
> cpu 398 16 5049 15754 5490 0 1 46 0 0
> cpu0 206 5 450 0 0 0 1 14 0 0
> cpu1 81 0 3937 3149 1514 0 0 9 0 0
> cpu2 45 6 332 6052 2243 0 0 11 0 0
> cpu3 65 2 328 6552 1732 0 0 11 0 0
>
> The idle and iowait is weird 0 for cpu0.
>
> The testing host versions (4.8-rc4, ubuntu default 3.16), guest
> versions (<= 4.7-rc7 good, >= 4.8-rc1 bad). The bug is still under
> bisect by me.

This commit fix it.
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=08d072599234c959b0b82b63fa252c129225a899

Regards,
Wanpeng Li