Fwd: About patch bdedff263132 - KVM: x86: Route pending NMIs

From: Prasad Pandit
Date: Fri Dec 29 2023 - 12:59:50 EST


Hello Sean,

On Tue, 31 Oct 2023 at 17:45, Prasad Pandit <ppandit@xxxxxxxxxx> wrote:
> On Mon, 30 Oct 2023 at 20:41, Sean Christopherson <seanjc@xxxxxxxxxx> wrote:
>>> - kvm_make_request(KVM_REQ_NMI, vcpu);
>>> + if (events->nmi.pending)
>>> + kvm_make_request(KVM_REQ_NMI, vcpu);
> >
> > This looks sane, but it should be unnecessary as KVM_REQ_NMI nmi_queued=0 should
> > be a (costly) nop. Hrm, unless the vCPU is in HLT, in which case KVM will treat
> > a spurious KVM_REQ_NMI as a wake event. When I made this change, my assumption
> > was that userspace would set KVM_VCPUEVENT_VALID_NMI_PENDING iff there was
> > relevant information to process. But if I'm reading the code correctly, QEMU
> > invokes KVM_SET_VCPU_EVENTS with KVM_VCPUEVENT_VALID_NMI_PENDING at the end of
> > machine creation.
> >

QEMU:
qemu_thread_start
kvm_start_vcpu_thread
kvm_vcpu_thread_fn
kvm_cpu_exec
kvm_arch_put_registers
kvm_put_vcpu_events (cpu=..., level=1)

qemu_thread_start (args=0x559fdc852110) at ../util/qemu-thread-posix.c:534
kvm_vcpu_thread_fn (arg=0x559fdc84cdc0) at ../accel/kvm/kvm-accel-ops.c:56
qemu_wait_io_event (cpu=0x559fdc84cdc0) at ../softmmu/cpus.c:435
qemu_wait_io_event_common (cpu=0x559fdc84cdc0) at ../softmmu/cpus.c:411
process_queued_cpu_work (cpu=0x559fdc84cdc0) at ../cpus-common.c:351
do_kvm_cpu_synchronize_post_reset (cpu=0x559fdc84cdc0, arg=...)
at ../accel/kvm/kvm-all.c:2808
kvm_arch_put_registers (cpu=0x559fdc84cdc0, level=2) at
../target/i386/kvm/kvm.c:4664
kvm_put_vcpu_events (cpu=0x559fdc84cdc0, level=2) at
../target/i386/kvm/kvm.c:4308

qemu_thread_start (args=0x559fdc852110) at ../util/qemu-thread-posix.c:534
kvm_vcpu_thread_fn (arg=0x559fdc84cdc0) at ../accel/kvm/kvm-accel-ops.c:56
qemu_wait_io_event (cpu=0x559fdc84cdc0) at ../softmmu/cpus.c:435
qemu_wait_io_event_common (cpu=0x559fdc84cdc0) at ../softmmu/cpus.c:411
process_queued_cpu_work (cpu=0x559fdc84cdc0) at ../cpus-common.c:351
do_kvm_cpu_synchronize_post_init (cpu=0x559fdc84cdc0, arg=...) at
../accel/kvm/kvm-all.c:2819
kvm_arch_put_registers (cpu=0x559fdc84cdc0, level=3) at
../target/i386/kvm/kvm.c:4664
kvm_put_vcpu_events (cpu=0x559fdc84cdc0, level=3) at
../target/i386/kvm/kvm.c:4308

Kernel:
kvm_vcpu_ioctl
mutex_lock_killable(&vcpu->mutex)
kvm_arch_vcpu_ioctl(, KVM_SET_VCPU_EVENTS, ... )
mutex_unlock(&vcpu->mutex);
-> kvm_vcpu_ioctl_x86_set_vcpu_events()

* Above are 3 different ways in which KVM_SET_VCPU_EVENTS ioctl(2) gets called.
QEMU/target/i386/kvm/kvm.c: kvm_put_vcpu_events()
if (level >= KVM_PUT_RESET_STATE) {
events.flags |= KVM_VCPUEVENT_VALID_NMI_PENDING;
But KVM_VCPUEVENT_VALID_NMI_PENDING is set only when level >=
2(KVM_PUT_RESET_STATE). ie. in the first (level=1) case _NMI_PENDING
is not set.

* In the real-time host set-up I have, KVM_VCPUEVENT_VALID_NMI_PENDING
is called twice for each VCPU and after that kernel goes into what
looks like a lock contention loop. Each time
KVM_VCPUEVENT_VALID_NMI_PENDING is called with 'cpu->env->nmi_injected
= 0' and 'cpu->env->nmi_pending = 0'. ie. for each VCPU two NMI
events are injected via - kvm_make_request(KVM_REQ_NMI, vcpu), when
vcpu has no NMIs pending.

# perf lock report -t
Name acquired contended avg wait total wait
max wait min wait

CPU 3/KVM 154017 154017 62.19 us 9.58 s
101.01 us 1.49 us
CPU 9/KVM 152796 152796 62.67 us 9.58 s
95.92 us 1.49 us
CPU 7/KVM 151554 151554 63.16 us 9.57 s
102.70 us 1.48 us
CPU 1/KVM 151273 151273 65.30 us 9.88 s
98.88 us 1.52 us
CPU 6/KVM 151107 151107 63.34 us 9.57 s
107.64 us 1.50 us
CPU 8/KVM 151038 151038 63.37 us 9.57 s
102.93 us 1.51 us
CPU 2/KVM 150701 150701 63.52 us 9.57 s
99.24 us 1.50 us
CPU 5/KVM 150695 150695 63.56 us 9.58 s
142.15 us 1.50 us
CPU 4/KVM 150527 150527 63.60 us 9.57 s
102.04 us 1.44 us
qemu-system-x86 665 665 65.92 us 43.84 ms
100.67 us 1.55 us
CPU 0/KVM 2 2 210.46 us 420.92
us 411.89 us 9.03 us
qemu-system-x86 1 1 404.91 us 404.91 us
404.91 us 404.91 us
TC tc-pc.ram 1 1 414.22 us 414.22
us 414.22 us 414.22 us
=== output for debug===
bad: 10, total: 13
bad rate: 76.92 %
histogram of events caused bad sequence
acquire: 0
acquired: 10
contended: 0
release: 0


* VCPU#0 thread seems to wait indefinitely to get
qemu_mutex_iothread_lock() to make any progress. The proposed patch
above to check 'events->nmi_pending' for non-zero value helps to fix
this issue.

...wdyt?

Thank you.
---
- Prasad
PS: The kvm_make_request() routine has following comment, I wonder if
this is what is happening with empty NMI events.
Request that don't require vCPU action should never be logged in
vcpu->requests. The vCPU won't clear the request, so it will stay
logged indefinitely and prevent the vCPU from entering the guest.