Re: [PATCH] KVM: x86: make KVM_REQ_NMI request iff NMI pending for vcpu

From: Dongli Zhang
Date: Thu Feb 08 2024 - 19:34:50 EST




On 2/6/24 12:58, Sean Christopherson wrote:
> On Tue, Feb 06, 2024, Dongli Zhang wrote:
>> Hi Prasad,
>>
>> On 1/2/24 23:53, Prasad Pandit wrote:
>>> From: Prasad Pandit <pjp@xxxxxxxxxxxxxxxxx>
>>>
>>> kvm_vcpu_ioctl_x86_set_vcpu_events() routine makes 'KVM_REQ_NMI'
>>> request for a vcpu even when its 'events->nmi.pending' is zero.
>>> Ex:
>>> qemu_thread_start
>>> kvm_vcpu_thread_fn
>>> qemu_wait_io_event
>>> qemu_wait_io_event_common
>>> process_queued_cpu_work
>>> do_kvm_cpu_synchronize_post_init/_reset
>>> kvm_arch_put_registers
>>> kvm_put_vcpu_events (cpu, level=[2|3])
>>>
>>> This leads vCPU threads in QEMU to constantly acquire & release the
>>> global mutex lock, delaying the guest boot due to lock contention.
>>
>> Would you mind sharing where and how the lock contention is at QEMU space? That
>> is, how the QEMU mutex lock is impacted by KVM KVM_REQ_NMI?
>>
>> Or you meant line 3031 at QEMU side?
>
> Yeah, something like that. Details in this thread.
>
> https://urldefense.com/v3/__https://lore.kernel.org/all/CAE8KmOyffXD4k69vRAFwesaqrBGzFY3i*kefbkHcQf4=jNYzOA@mail.gmail.com__;Kw!!ACWV5N9M2RV99hQ!N61g2QXuC5B5RpVNBowgKUXjHzX4vp0lCXuton3fKVRbzBuXaVtJgePu0ddSf3EB9EEQORTmwop4vD5KrQ$

Thank you very much for pointing to the discussion. I should have found them :)

Here is my understanding.

1. During the VM creation, the mp_state of AP (non-BSP) is always
KVM_MP_STATE_UNINITIALIZED, until INIT/SIPI.

2. Ideally, AP should block at below. That is, line 3775 is always false.

3760 bool kvm_vcpu_block(struct kvm_vcpu *vcpu)
3761 {
3762 struct rcuwait *wait = kvm_arch_vcpu_get_wait(vcpu);
3763 bool waited = false;
3764
3765 vcpu->stat.generic.blocking = 1;
3766
3767 preempt_disable();
3768 kvm_arch_vcpu_blocking(vcpu);
3769 prepare_to_rcuwait(wait);
3770 preempt_enable();
3771
3772 for (;;) {
3773 set_current_state(TASK_INTERRUPTIBLE);
3774
3775 if (kvm_vcpu_check_block(vcpu) < 0)
3776 break;
3777
3778 waited = true;
3779 schedule();
3780 }

3. Unfortunately, the issue may set KVM_REQ_NMI for AP.

4. This leads to the kvm_vcpu_check_block() to return.

kvm_arch_vcpu_ioctl_run()
-> kvm_vcpu_block()
-> kvm_vcpu_check_block()
-> kvm_arch_vcpu_runnable()
-> kvm_vcpu_has_events()
-> kvm_test_request(KVM_REQ_NMI, vcpu)


5. The kvm_arch_vcpu_ioctl_run() returns to QEMU with -EAGAIN.

6. The QEMU side is not able to handle -EAGAIN, but to goto line 2984 to return.

It acquires the global mutex at line 2976 (release before entering into guest
again). The KVM_REQ_NMI is never cleared until INIT/SIPI.


2808 int kvm_cpu_exec(CPUState *cpu)
2809 {
.. ...
2868 if (run_ret < 0) {
2869 if (run_ret == -EINTR || run_ret == -EAGAIN) {
2870 trace_kvm_io_window_exit();
2871 kvm_eat_signals(cpu);
2872 ret = EXCP_INTERRUPT;
2873 break;
2874 }
.. ...
2973 } while (ret == 0);
2974
2975 cpu_exec_end(cpu);
2976 bql_lock();
2977
2978 if (ret < 0) {
2979 cpu_dump_state(cpu, stderr, CPU_DUMP_CODE);
2980 vm_stop(RUN_STATE_INTERNAL_ERROR);
2981 }
2982
2983 qatomic_set(&cpu->exit_request, 0);
2984 return ret;
2985 }


7. The QEMU AP vCPU thread enters into KVM_RUN again. Same flow as step 4, goto
step 4, again and again.

The lock has been frequently acquired/released. The vCPU 0 is unhappy with it,
especially when the number of APs is large!

I guess it is not an issue after VM reboot (without QEMU instance re-creation
because the mpstate is not KVM_MP_STATE_UNINITIALIZED any longer).


Thank you very much!

Dongli Zhang

>
>> 2858 int kvm_cpu_exec(CPUState *cpu)
>> 2859 {
>> 2860 struct kvm_run *run = cpu->kvm_run;
>> 2861 int ret, run_ret;
>> ... ...
>> 3023 default:
>> 3024 DPRINTF("kvm_arch_handle_exit\n");
>> 3025 ret = kvm_arch_handle_exit(cpu, run);
>> 3026 break;
>> 3027 }
>> 3028 } while (ret == 0);
>> 3029
>> 3030 cpu_exec_end(cpu);
>> 3031 qemu_mutex_lock_iothread();
>> 3032
>> 3033 if (ret < 0) {
>> 3034 cpu_dump_state(cpu, stderr, CPU_DUMP_CODE);
>> 3035 vm_stop(RUN_STATE_INTERNAL_ERROR);
>> 3036 }
>> 3037
>> 3038 qatomic_set(&cpu->exit_request, 0);
>> 3039 return ret;
>> 3040 }