Re: [PATCH] KVM: X86: set EXITING_GUEST_MODE as soon as vCPU exits

From: Jon Kohler
Date: Mon Dec 05 2022 - 10:10:42 EST




> On Dec 1, 2022, at 2:17 PM, Sean Christopherson <seanjc@xxxxxxxxxx> wrote:
>
> On Tue, Nov 29, 2022, Jon Kohler wrote:
>> Set vcpu->mode to EXITING_GUEST_MODE as soon vCPU exits to reflect
>> that we are indeed exiting guest mode, but not quite out of guest
>> mode yet. Note: This is done lazily without an explicit memory
>> barrier so that we do not regress the cost in the critical path
>> of going from the exit to the exit handler.
>
> This is not remotely sufficient justification. Memory "barriers" are just compiler
> barriers on x86, so the odds of regressing the VM-Enter/VM-Exit cost without
> introducing a bug are miniscule.

I was talking about hardware memory barriers, not compiler memory barriers. I’ll
tune that up and take a different approach on v2.

>
>> Flip back to IN_GUEST_MODE for exits that use
>> EXIT_FASTPATH_REENTER_GUEST, such that we are IN_GUEST_MODE upon
>> reentry.
>>
>> Changing vcpu->mode away from IN_GUEST_MODE as early as possible
>
> Except this isn't as early as possible. If we're going to bother doing something
> like this, my vote is to move it into assembly.

In vmenter.S, tacking on to call vmx_spec_ctrl_restore_host seemed like the
most logical place after handling all of the state saves and RSB work. Are you
saying put it even closer to the exit, meaning before the FILL_RETURN_BUFFER?

>
>> gives IPI senders as much runway as possible to avoid ringing
>> doorbell or sending posted interrupt IPI in AMD and Intel,
>> respectively. Since this is done without an explicit memory
>> barrier, the worst case is that the IPI sender sees IN_GUEST_MODE
>> still and sends a spurious event, which is the behavior prior
>> to this patch.
>
> No, worst case scenario is that kvm_vcpu_exiting_guest_mode() sees EXITING_GUEST_MODE
> and doesn't kick the vCPU. For "kicks" that set a request, kvm_vcpu_exit_request()
> will punt the vCPU out of the tight run loop, though there might be ordering issues.
>
> But whether or not there are ordering issues is a moot point since there are uses
> of kvm_vcpu_kick() that aren't accompanied by a request, e.g. to purge the PML
> buffers. In other words, kvm_vcpu_kick() absolutely cannot have false negatives.
> We could modify KVM to require a request when using kvm_vcpu_kick(), but that's
> a bit of a hack, and all of the possible ordering problems is still a pile of
> complexity I'd rather avoid.
>
> No small part of me thinks we'd be better off adding a dedicated flag to very
> precisely track whether or not the vCPU is truly "in the guest" for the purposes
> of sending IPIs. Things like kicks have different requirements around IN_GUEST_MODE
> than sending interrupts, e.g. KVM manually processes the IRR on every VM-Enter and
> so lack of an IPI is a non-issue, whereas missing an IPI for a kick is problematic.
> In other words, EXITING_GUEST_MODE really needs to mean "existing the run loop".

Do you mean:
“one small part” (as in give this a shot, maybe),
or
“no small part” (as in good-god-don’t-do-this!)

I’m assuming you meant one small part :) sure, how about something like:

To my earlier comment about when to do this within a few instructions, I don’t want
to clobber other stuff happening as part of the enter/exit, what if we repurposed/renamed
vmx_update_host_rsp and vmx_spec_ctrl_restore_host to make them “do stuff before
entry” and “do stuff right after entry returns” functions. That way we wouldn’t have to
add another other function calls or change the existing control flow all that much.

In “do before” we could set something like vcpu->non_root_mode = true
In “do after” we could set vcpu->non_root_mode = false

Or perhaps something like (respectively)
vcpu->operational_state = NON_ROOT_MODE
vcpu->operational_state = ROOT_MODE

Using the root/non-root moniker would precisely track when the whether the
vCPU is in guest, and aligns with the language used to describe such a state
from the SDM.

Thoughts?


>
> E.g. toggle the dedicated flag within a few instructions of VM-Enter and
> VM-Exit for maximum efficiency for interrupts, and avoid having to make vcpu->mode
> more complex than it already is.
>
> To add clarity, we could even rename IN_GUEST_MODE and EXITING_GUEST_MODE to
> something like IN_RUN_LOOP and EXITING_RUN_LOOP.

Yea, this feels like a good idea to reduce confusion. Let’s nail this thread down, then I
can propose a wider cleanup?

>
>> Signed-off-by: Jon Kohler <jon@xxxxxxxxxxx>
>> ---
>> arch/x86/kvm/svm/svm.c | 7 +++++++
>> arch/x86/kvm/vmx/vmx.c | 23 +++++++++++++++++++++++
>> arch/x86/kvm/x86.c | 8 ++++++++
>> 3 files changed, 38 insertions(+)
>>
>> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
>> index ce362e88a567..5f0c118a3ffd 100644
>> --- a/arch/x86/kvm/svm/svm.c
>> +++ b/arch/x86/kvm/svm/svm.c
>> @@ -3907,6 +3907,13 @@ static noinstr void svm_vcpu_enter_exit(struct kvm_vcpu *vcpu, bool spec_ctrl_in
>> else
>> __svm_vcpu_run(svm, spec_ctrl_intercepted);
>>
>> + /* Optimize IPI reduction by setting mode immediately after vmexit
>
> /*
> * Because KVM isn't the crazy land of net/ block comments should like
> * this.
> */

Ack, will change in v2

>
>> + * without a memmory barrier as this as not paired anywhere.
>> + * is will be set to OUTSIDE_GUEST_MODE in x86 common code with a memory
>> + * barrier, after the host is done fully restoring various host states.
>> + */
>> + vcpu->mode = EXITING_GUEST_MODE;
>> +
>> guest_state_exit_irqoff();
>> }
>>
>> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
>> index 63247c57c72c..243dcb87c727 100644
>> --- a/arch/x86/kvm/vmx/vmx.c
>> +++ b/arch/x86/kvm/vmx/vmx.c
>> @@ -5878,6 +5878,17 @@ static fastpath_t handle_fastpath_preemption_timer(struct kvm_vcpu *vcpu)
>>
>> if (!vmx->req_immediate_exit &&
>> !unlikely(vmx->loaded_vmcs->hv_timer_soft_disabled)) {
>> + /* Reset IN_GUEST_MODE since we're going to reenter
>> + * guest as part of this fast path. This is done as
>> + * an optimization without a memory barrier since
>> + * EXITING_GUEST_MODE is also set without a memory
>
> Heh, justifying the lack of a memory barrier by saying pointing out the other
> code you added doesn't use a memory barrier is interesting, to put it politely.

Message received! Will mop it up.

>
>> + * barrier. This also needs to be reset prior to
>> + * calling apic_timer_expired() so that
>> + * kvm_use_posted_timer_interrupt() returns the proper
>> + * value.
>> + */
>> + if (vcpu->mode == EXITING_GUEST_MODE)
>> + vcpu->mode = IN_GUEST_MODE;
>
> It's far easier, likely more performant, documents why this has a chance of working,
> and significantly less error prone to do this unconditionally in either assembly
> or after the EXIT_FASTPATH_REENTER_GUEST check in vcpu_enter_guest().

Ack, will do that there.