Re: [PATCH v2 2/9] KVM: x86: remove return value of kvm_vcpu_block

From: Sean Christopherson
Date: Wed Aug 17 2022 - 12:41:13 EST


On Wed, Aug 17, 2022, Paolo Bonzini wrote:
> On 8/17/22 01:34, Sean Christopherson wrote:
> > Isn't freeing up the return from kvm_vcpu_check_block() unnecessary? Can't we
> > just do:
> >
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index 9f11b505cbee..ccb9f8bdeb18 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -10633,7 +10633,7 @@ static inline int vcpu_block(struct kvm_vcpu *vcpu)
> > if (hv_timer)
> > kvm_lapic_switch_to_hv_timer(vcpu);
> >
> > - if (!kvm_check_request(KVM_REQ_UNHALT, vcpu))
> > + if (!kvm_arch_vcpu_runnable(vcpu))
> > return 1;
> > }
> >
> >
> > which IMO is more intuitive and doesn't require reworking halt-polling (again).
>
> This was my first idea indeed. However I didn't like calling
> kvm_arch_vcpu_runnable() again and "did it schedule()" seemed to be a less
> interesting result from kvm_vcpu_block() (and in fact kvm_vcpu_halt() does
> not bother passing it up the return chain).

The flip side of calling kvm_arch_vcpu_runnable() again is that KVM will immediately
wake the vCPU if it becomes runnable after kvm_vcpu_check_block(). The edge cases
where the vCPU becomes runnable late are unlikely to truly matter in practice, but
on the other hand manually re-checking kvm_arch_vcpu_runnable() means KVM gets both
cases "right" (waited=true iff vCPU actually waited, vCPU awakened ASAP), whereas
squishing the information into the return of kvm_vcpu_check_block() means KVM gets
both cases "wrong" (waited=true even if schedule() was never called, vCPU left in
a non-running state even though it's runnable).

My only hesitation with calling kvm_arch_vcpu_runnable() again is that it could be
problematic if KVM somehow managed to consume the event that caused kvm_vcpu_has_events()
to return true, but I don't see how that could happen without it being a KVM bug.

Ah, and if we do go with the explicit check, it should come with a comment to call
out that KVM needs to return up the stack, e.g.

/*
* If KVM stopped blocking the vCPU but the vCPU is still not
* runnable, then there is a pending host event of some kind,
* e.g. a pending signal. Return up the stack to service the
* pending event without changing the vCPU's activity state.
*/
if (!kvm_arch_vcpu_runnable(vcpu))
return 1;

so that we don't end combining the checks into:

while (!kvm_arch_vcpu_runnable(vcpu)) {
/*
* Switch to the software timer before halt-polling/blocking as
* the guest's timer may be a break event for the vCPU, and the
* hypervisor timer runs only when the CPU is in guest mode.
* Switch before halt-polling so that KVM recognizes an expired
* timer before blocking.
*/
hv_timer = kvm_lapic_hv_timer_in_use(vcpu);
if (hv_timer)
kvm_lapic_switch_to_sw_timer(vcpu);

kvm_vcpu_srcu_read_unlock(vcpu);
if (vcpu->arch.mp_state == KVM_MP_STATE_HALTED)
kvm_vcpu_halt(vcpu);
else
kvm_vcpu_block(vcpu);
kvm_vcpu_srcu_read_lock(vcpu);

if (hv_timer)
kvm_lapic_switch_to_hv_timer(vcpu);
}

which looks sane, but would mean KVM never bounces out to handle whatever event
needs handling.

Side topic, usage of kvm_apic_accept_events() appears to be broken (though nothing
can trigger the bug). If kvm_apic_accept_events() were to return an -errno, then
kvm_arch_vcpu_ioctl_run() would return '0' to userspace without updating
vcpu->run->exit_reason. I think an easy fix is to drop the return value entirely
and then WARN if kvm_check_nested_events() returns something other than -EBUSY.

if (is_guest_mode(vcpu)) {
r = kvm_check_nested_events(vcpu);
if (r < 0) {
WARN_ON_ONCE(r != -EBUSY);
return;
}