Re: [PATCH 17/15] KVM: X86: Ensure pae_root to be reconstructed for shadow paging if the guest PDPTEs is changed

From: Sean Christopherson
Date: Tue Dec 07 2021 - 19:15:52 EST


On Thu, Nov 11, 2021, Lai Jiangshan wrote:
> From: Lai Jiangshan <laijs@xxxxxxxxxxxxxxxxx>
>
> For shadow paging, the pae_root needs to be reconstructed before the
> coming VMENTER if the guest PDPTEs is changed.
>
> But not all paths that call load_pdptrs() will cause the pae_root to be
> reconstructed. Normally, kvm_mmu_reset_context() and kvm_mmu_free_roots()
> are used to launch later reconstruction.
>
> The commit d81135a57aa6("KVM: x86: do not reset mmu if CR0.CD and
> CR0.NW are changed") skips kvm_mmu_reset_context() after load_pdptrs()
> when changing CR0.CD and CR0.NW.
>
> The commit 21823fbda552("KVM: x86: Invalidate all PGDs for the current
> PCID on MOV CR3 w/ flush") skips kvm_mmu_free_roots() after
> load_pdptrs() when rewriting the CR3 with the same value.

This isn't accurate, prior to that commit KVM wasn't guaranteed to do
kvm_mmu_free_roots() if it got a hit on the current CR3 or if a previous CR3 in
the cache matched the new CR3 (the "cache" has done some odd things in the past).

So I think this particular flavor would be:

Fixes: 7c390d350f8b ("kvm: x86: Add fast CR3 switch code path")

> The commit a91a7c709600("KVM: X86: Don't reset mmu context when
> toggling X86_CR4_PGE") skips kvm_mmu_reset_context() after
> load_pdptrs() when changing CR4.PGE.
>
> Normally, the guest doesn't change the PDPTEs before doing only the
> above operation without touching other bits that can force pae_root to
> be reconstructed. Guests like linux would keep the PDPTEs unchaged
> for every instance of pagetable.
>
> Fixes: d81135a57aa6("KVM: x86: do not reset mmu if CR0.CD and CR0.NW are changed")
> Fixes: 21823fbda552("KVM: x86: Invalidate all PGDs for the current PCID on MOV CR3 w/ flush")
> Fixes: a91a7c709600("KVM: X86: Don't reset mmu context when toggling X86_CR4_PGE")
> Signed-off-by: Lai Jiangshan <laijs@xxxxxxxxxxxxxxxxx>
> ---
> arch/x86/kvm/x86.c | 10 ++++++++--
> 1 file changed, 8 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 0176eaa86a35..cfba337e46ab 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -832,8 +832,14 @@ int load_pdptrs(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu, unsigned long cr3)
> if (memcmp(mmu->pdptrs, pdpte, sizeof(mmu->pdptrs))) {
> memcpy(mmu->pdptrs, pdpte, sizeof(mmu->pdptrs));
> kvm_register_mark_dirty(vcpu, VCPU_EXREG_PDPTR);
> - /* Ensure the dirty PDPTEs to be loaded. */
> - kvm_make_request(KVM_REQ_LOAD_MMU_PGD, vcpu);
> + /*
> + * Ensure the dirty PDPTEs to be loaded for VMX with EPT
> + * enabled or pae_root to be reconstructed for shadow paging.
> + */
> + if (tdp_enabled)
> + kvm_make_request(KVM_REQ_LOAD_MMU_PGD, vcpu);
> + else
> + kvm_mmu_free_roots(vcpu, vcpu->arch.mmu, KVM_MMU_ROOT_CURRENT);

Shouldn't matter since it's legacy shadow paging, but @mmu should be used instead
of vcpu->arch.mmuvcpu->arch.mmu.

To avoid a dependency on the previous patch, I think it makes sense to have this be:

if (!tdp_enabled && memcmp(mmu->pdptrs, pdpte, sizeof(mmu->pdptrs)))
kvm_mmu_free_roots(vcpu, mmu, KVM_MMU_ROOT_CURRENT);

before the memcpy().

Then we can decide independently if skipping the KVM_REQ_LOAD_MMU_PGD if the
PDPTRs are unchanged with respect to the MMU is safe.