Re: [PATCH v2 16/18] KVM: x86: introduce KVM_REQ_MMU_UPDATE_ROOT

From: Sean Christopherson
Date: Fri Feb 18 2022 - 16:45:41 EST


On Thu, Feb 17, 2022, Paolo Bonzini wrote:
> Whenever KVM knows the page role flags have changed, it needs to drop
> the current MMU root and possibly load one from the prev_roots cache.
> Currently it is papering over some overly simplistic code by just
> dropping _all_ roots, so that the root will be reloaded by
> kvm_mmu_reload, but this has bad performance for the TDP MMU
> (which drops the whole of the page tables when freeing a root,
> without the performance safety net of a hash table).
>
> To do this, KVM needs to do a more kvm_mmu_update_root call from
> kvm_mmu_reset_context. Introduce a new request bit so that the call
> can be delayed until after a possible KVM_REQ_MMU_RELOAD, which would
> kill all hopes of finding a cached PGD.
>
> Signed-off-by: Paolo Bonzini <pbonzini@xxxxxxxxxx>
> ---

Please no.

I really, really do not want to add yet another deferred-load in the nested
virtualization paths. As Jim pointed out[1], KVM_REQ_GET_NESTED_STATE_PAGES should
never have been merged. And on that point, I've no idea how this new request will
interact with KVM_REQ_GET_NESTED_STATE_PAGE. It may be a complete non-issue, but
I'd honestly rather not have to spend the brain power.

And I still do not like the approach of converting kvm_mmu_reset_context() wholesale
to not doing kvm_mmu_unload(). There are currently eight kvm_mmu_reset_context() calls:

1. nested_vmx_restore_host_state() - Only for a missed VM-Entry => VM-Fail
consistency check, not at all a performance concern.

2. kvm_mmu_after_set_cpuid() - Still needs to unload. Not a perf concern.

3. kvm_vcpu_reset() - Relevant only to INIT. Not a perf concern, but could be
converted manually to a different path without too much fuss.

4+5. enter_smm() / kvm_smm_changed() - IMO, not a perf concern, but again could
be converted manually if anyone cares.

6. set_efer() - Silly corner case that basically requires host userspace abuse
of KVM APIs. Not a perf concern.

7+8. kvm_post_set_cr0/4() - These are the ones we really care about, and they
can be handled quite trivially, and can even share much of the logic with
kvm_set_cr3().

I strongly prefer that we take a more conservative approach and fix 7+8, and then
tackle 1, 3, and 4+5 separately if someone cares enough about those flows to avoid
dropping roots.

Regarding KVM_REQ_MMU_RELOAD, that mess mostly goes away with my series to replace
that with KVM_REQ_MMU_FREE_OBSOLETE_ROOTS. Obsolete TDP MMU roots will never get
a cache hit because the obsolete root will have an "invalid" role. And if we care
about optimizing this with respect to a memslot (highly unlikely), then we could
add an MMU generation check in the cache lookup. I was planning on posting that
series as soon as this one is queued, but I'm more than happy to speculatively send
a refreshed version that applies on top of this series.

[1] https://lore.kernel.org/all/CALMp9eT2cP7kdptoP3=acJX+5_Wg6MXNwoDh42pfb21-wdXvJg@xxxxxxxxxxxxxx
[2] https://lore.kernel.org/all/20211209060552.2956723-1-seanjc@xxxxxxxxxx