[PATCH 0/5] Fix a race between posted interrupt delivery and migration in a nested VM

From: Mingwei Zhang
Date: Tue Aug 02 2022 - 19:07:28 EST


This patch set aims to fix a race condition between posted interrupt
delivery and migration for a nested VM. In particular, we proves that when
a nested vCPU is halted and just migrated, it will lose a posted
interrupt from another vCPU in the same VM.

The patches consist of 1 kernel change which is the fix and the rest of
the changes generate a selftest that articulates such a racing scenario
to prove the existence of the race. In summary, running this test on an
unpatched kernel will generate a warning [1] and with that, we proves that
there is the loss of a posted interrupt. Note, the warning will only happen
once per reboot, since it is a WARN_ON_ONCE.

[1] The kernel warning happens at arch/x86/kvm/vmx/vmx.c:

static bool vmx_guest_apic_has_interrupt(struct kvm_vcpu *vcpu)
{
...
if (WARN_ON_ONCE(!is_guest_mode(vcpu)) ||
!nested_cpu_has_vid(get_vmcs12(vcpu)) ||
WARN_ON_ONCE(!vmx->nested.virtual_apic_map.gfn)) <= HERE
return false;
...
}

The dump is there:

[237880.809453] ------------[ cut here ]------------
[237880.809455] WARNING: CPU: 21 PID: 112454 at
arch/x86/kvm/vmx/vmx.c:3973 vmx_guest_apic_has_interrupt+0x79/0xe0
[kvm_intel]
[237880.809469] Modules linked in: kvm_intel vfat fat i2c_mux_pca954x
i2c_mux spidev cdc_acm xhci_pci xhci_hcd sha3_generic gq(O)
[237880.809479] CPU: 21 PID: 112454 Comm: vmx_migrate_pi_ Tainted: G S
O 5.19.0-smp-DEV #2
......
[237880.809484] RIP: 0010:vmx_guest_apic_has_interrupt+0x79/0xe0
[kvm_intel]
[237880.809491] Code: c6 76 2d 41 81 e6 f0 00 00 00 48 8b 83 68 25 00 00
b9 f0 00 00 00 23 88 a0 00 00 00 44 39 f1 0f 92 c0 eb c0 0f 0b 31 c0 eb
ba <0f> 0b 31 c0 eb b4 80 3d 41 c9 02 00 00 74 39 48 c7 c7 18 f3 12 c0
[237880.809493] RSP: 0018:ffff88815c9e7d80 EFLAGS: 00010246
[237880.809495] RAX: ffff88813acbd000 RBX: ffff8881943ec9c0 RCX:
00000000ffffffff
[237880.809497] RDX: 0000000000000000 RSI: ffff8881d8676000 RDI:
ffff8881943ec9c0
[237880.809499] RBP: ffff88815c9e7d90 R08: ffff88815c9e7ce8 R09:
ffff88815c9e7cf0
[237880.809500] R10: 0000000000000000 R11: 0000000000000000 R12:
0000000000009aa8
[237880.809501] R13: ffff8881943ec9c0 R14: ffff8881943ed101 R15:
ffff8881943ec9c0
[237880.809503] FS: 00000000006283c0(0000) GS:ffff88af80740000(0000)
knlGS:0000000000000000
[237880.809505] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[237880.809506] CR2: 00007f9314b4f001 CR3: 00000001cd7b0005 CR4:
00000000003726e0
[237880.809508] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[237880.809509] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
0000000000000400
[237880.809511] Call Trace:
[237880.809512] <TASK>
[237880.809514] kvm_vcpu_has_events+0xe1/0x150
[237880.809519] vcpu_run+0xee/0x2c0
[237880.809523] kvm_arch_vcpu_ioctl_run+0x355/0x610
[237880.809526] kvm_vcpu_ioctl+0x551/0x610
[237880.809531] ? do_futex+0xc8/0x160
[237880.809537] __se_sys_ioctl+0x77/0xc0
[237880.809541] __x64_sys_ioctl+0x1d/0x20
[237880.809543] do_syscall_64+0x44/0xa0
[237880.809549] ? irqentry_exit+0x12/0x30
[237880.809552] entry_SYSCALL_64_after_hwframe+0x46/0xb0
[237880.809555] RIP: 0033:0x471777
...
[237880.809570] </TASK>
[237880.809571] ---[ end trace 0000000000000000 ]---



Jim Mattson (1):
selftests: KVM: Test if posted interrupt delivery race with migration

Mingwei Zhang (3):
selftests: KVM/x86: Add APIC state into kvm_x86_state
selftests: KVM: Introduce vcpu_run_interruptable()
selftests: KVM: Add support for posted interrupt handling in L2

Oliver Upton (1):
kvm: x86: get vmcs12 pages before checking pending interrupts

arch/x86/kvm/x86.c | 17 ++
tools/testing/selftests/kvm/.gitignore | 1 +
tools/testing/selftests/kvm/Makefile | 1 +
.../selftests/kvm/include/kvm_util_base.h | 12 +
.../selftests/kvm/include/x86_64/processor.h | 1 +
.../selftests/kvm/include/x86_64/vmx.h | 10 +
tools/testing/selftests/kvm/lib/kvm_util.c | 11 +
.../selftests/kvm/lib/x86_64/processor.c | 2 +
tools/testing/selftests/kvm/lib/x86_64/vmx.c | 16 +
.../kvm/x86_64/vmx_migrate_pi_pending.c | 289 ++++++++++++++++++
10 files changed, 360 insertions(+)
create mode 100644 tools/testing/selftests/kvm/x86_64/vmx_migrate_pi_pending.c

--
2.37.1.455.g008518b4e5-goog