Re: [PATCH 02/13] KVM: nSVM: don't call nested_sync_control_from_vmcb02 on each VM exit

From: Sean Christopherson
Date: Mon Nov 21 2022 - 12:51:23 EST


On Mon, Nov 21, 2022, Maxim Levitsky wrote:
> On Thu, 2022-11-17 at 20:04 +0000, Sean Christopherson wrote:
> > On Thu, Nov 17, 2022, Maxim Levitsky wrote:
> > > Calling nested_sync_control_from_vmcb02 on each VM exit (nested or not),
> > > was an attempt to keep the int_ctl field in the vmcb12 cache
> > > up to date on each VM exit.
> >
> > This doesn't mesh with the reasoning in commit 2d8a42be0e2b ("KVM: nSVM: synchronize
> > VMCB controls updated by the processor on every vmexit"), which states that the
> > goal is to keep svm->nested.ctl.* synchronized, not vmcb12.  Or is nested.ctl the
> > cache you are referring to?
>
> Thanks for digging that commit out.
>
> My reasoning was that cache contains both control and 'save' fields, and
> we don't update 'save' fields on each VM exit.
>
> For control it indeed looks like int_ctl and eventinj are the only fields
> that are updated by the CPU, although IMHO they don't *need* to be updated
> until we do a nested VM exit, because the VM isn't supposed to look at vmcb
> while it is in use by the CPU, its state is undefined.

The point of the cache isn't to forward info to L2 though, it's so that KVM can
query the effective VMCB state without having to read guest memory and/or track
where the current state lives.

> For migration though, this does look like a problem. It can be fixed during
> reading the nested state but it is a hack as well.
>
> My idea was as you had seen in the patches it to unify int_ctl handling,
> since some bits might need to be copied to vmcb12 but some to vmcb01,
> and we happened to have none of these so far, and it "happened" to work.
>
> Do you have an idea on how to do this cleanly? I can just leave this as is
> and only sync the bits of int_ctl from vmcb02 to vmcb01 on nested VM exit.
> Ugly but would work.

That honestly seems like the best option to me. The ugly part isn't as much KVM's
caching as it is the mixed, conditional behavior of int_ctl. E.g. VMX has even
more caching and synchronization (eVMCS, shadow VMCS, etc...), but off the top of
my head I can't think of any scenarios where KVM needs to splice/split VMCS fields.
KVM needs to conditionally sync fields, but not split like this.

In other words, I think this particular code is going to be rather ugly no matter
what KVM does.