Re: [PATCH] KVM: Avoid atomic operations when kicking the running vCPU

From: Sean Christopherson
Date: Wed Oct 20 2021 - 15:34:29 EST


On Wed, Oct 20, 2021, Paolo Bonzini wrote:
> If we do have the vcpu mutex, as is the case if kvm_running_vcpu is set
> to the target vcpu of the kick, changes to vcpu->mode do not need atomic
> operations; cmpxchg is only needed _outside_ the mutex to ensure that
> the IN_GUEST_MODE->EXITING_GUEST_MODE change does not race with the vcpu
> thread going OUTSIDE_GUEST_MODE.
>
> Use this to optimize the case of a vCPU sending an interrupt to itself.
>
> Signed-off-by: Paolo Bonzini <pbonzini@xxxxxxxxxx>
> ---
> virt/kvm/kvm_main.c | 15 ++++++++++++++-
> 1 file changed, 14 insertions(+), 1 deletion(-)
>
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 3f6d450355f0..9f45f26fce4f 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -3325,6 +3325,19 @@ void kvm_vcpu_kick(struct kvm_vcpu *vcpu)
> if (kvm_vcpu_wake_up(vcpu))
> return;
>
> + me = get_cpu();
> + /*
> + * The only state change done outside the vcpu mutex is IN_GUEST_MODE
> + * to EXITING_GUEST_MODE. Therefore the moderately expensive "should
> + * kick" check does not need atomic operations if kvm_vcpu_kick is used
> + * within the vCPU thread itself.
> + */
> + if (vcpu == __this_cpu_read(kvm_running_vcpu)) {
> + if (vcpu->mode == IN_GUEST_MODE)
> + WRITE_ONCE(vcpu->mode, EXITING_GUEST_MODE);

Fun. I had a whole thing typed out about this being unsafe because it implicitly
relies on a pending request and that there's a kvm_vcpu_exit_request() check _after_
this kick. Then I saw your other patches, and then I realized we already have this
bug in the kvm_arch_vcpu_should_kick() below.

Anyways, I also think we should add do:

if (vcpu == __this_cpu_read(kvm_running_vcpu)) {
if (vcpu->mode == IN_GUEST_MODE &&
!WARN_ON_ONCE(!kvm_request_pending(vcpu)))
WRITE_ONCE(vcpu->mode, EXITING_GUEST_MODE);
goto out;
}

The idea being that delaying or even missing an event in case of a KVM bug is
preferable to letting the vCPU state become invalid due to running in the guest
with EXITING_GUEST_MODE.