Re: [PATCH] KVM: nVMX: Prevent vmlaunch with EPTP pointing outside assigned memory area

From: Sean Christopherson
Date: Thu Jun 29 2023 - 13:56:34 EST


On Thu, Jun 29, 2023, Yan Zhao wrote:
> On Wed, Jun 28, 2023 at 08:37:45AM -0700, Sean Christopherson wrote:
> ...
> > So I think we should try this:
> >
> > ---
> > arch/x86/kvm/mmu/mmu.c | 19 -------------------
> > include/linux/kvm_host.h | 1 -
> > virt/kvm/kvm_main.c | 13 ++-----------
> > 3 files changed, 2 insertions(+), 31 deletions(-)
> >
> > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> > index 60397a1beda3..e305737edf84 100644
> > --- a/arch/x86/kvm/mmu/mmu.c
> > +++ b/arch/x86/kvm/mmu/mmu.c
> > @@ -3671,19 +3671,6 @@ void kvm_mmu_free_guest_mode_roots(struct kvm *kvm, struct kvm_mmu *mmu)
> > }
> > EXPORT_SYMBOL_GPL(kvm_mmu_free_guest_mode_roots);
> >
> > -
> > -static int mmu_check_root(struct kvm_vcpu *vcpu, gfn_t root_gfn)
> > -{
> > - int ret = 0;
> > -
> > - if (!kvm_vcpu_is_visible_gfn(vcpu, root_gfn)) {
> > - kvm_make_request(KVM_REQ_TRIPLE_FAULT, vcpu);
> > - ret = 1;
> > - }
> > -
> > - return ret;
> > -}
> > -
> > static hpa_t mmu_alloc_root(struct kvm_vcpu *vcpu, gfn_t gfn, int quadrant,
> > u8 level)
> > {
> > @@ -3821,9 +3808,6 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
> > root_pgd = kvm_mmu_get_guest_pgd(vcpu, mmu);
> > root_gfn = root_pgd >> PAGE_SHIFT;
> >
> > - if (mmu_check_root(vcpu, root_gfn))
> > - return 1;
> > -
> Hi Sean,
> The checking of existence of memslot is still useful,
> Otherwise NULL pointer dereference would be met as in below call trace.
>
> mmu_alloc_shadow_roots
> |->mmu_alloc_root
> |->mmu_alloc_root(root_gfn)
> |->kvm_mmu_get_shadow_page
> |->__kvm_mmu_get_shadow_page
> |->kvm_mmu_alloc_shadow_page
> |->account_shadowed
> |->slot = __gfn_to_memslot(slots, gfn); ==>NULL pointer
> | kvm_slot_page_track_add_page(kvm, slot, gfn,KVM_PAGE_TRACK_WRITE);
> |->update_gfn_track(slot, gfn, mode, 1);
> |->index = gfn_to_index(gfn, slot->base_gfn, PG_LEVEL_4K); ==>NULL pointer dereference

Argh, right, the internal memslot might "work", but the no memslot case will not.
The non-root path effectively injects a page fault if there's no memslot.

Oof, and simply skipping the accounting for the no-slot case would result in an
imbalanced world if userspace creates a valid memslot before unaccount_shadowed()
is called.

As much as it pains me to propagate KVM's arbitrary behavior, doing the right from
an architectural perspective is really gross for KVM, e.g. would need all kinds of
dedicated code in the MMU.

I still don't like adding a non-architectural check to nested_vmx_check_eptp(),
especially since there would be a race, e.g. if a memslot were deleted between
nested_vmx_check_eptp() and allocating the root.

This is the least awful solution I can think of (not yet tested):

From: Sean Christopherson <seanjc@xxxxxxxxxx>
Date: Thu, 29 Jun 2023 10:54:12 -0700
Subject: [PATCH] KVM: nVMX: Allow triple fault shutdown in L2 to cancel nested
VM-Enter

<need to test and write a changelog>

Reported-by: Reima Ishii <ishiir@xxxxxxxxxxxxxxxxxxx>
Cc: Yan Zhao <yan.y.zhao@xxxxxxxxx>
Signed-off-by: Sean Christopherson <seanjc@xxxxxxxxxx>
---
arch/x86/kvm/vmx/nested.c | 11 +++++++++--
1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 22e08d30baef..6da6db575a27 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -4729,8 +4729,15 @@ void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 vm_exit_reason,
/* Pending MTF traps are discarded on VM-Exit. */
vmx->nested.mtf_pending = false;

- /* trying to cancel vmlaunch/vmresume is a bug */
- WARN_ON_ONCE(vmx->nested.nested_run_pending);
+ /*
+ * Canceling VMLAUNCH/VMRESUME is a KVM bug, but triple fault shutdown
+ * is allowed due to limitations with KVM's shadow MMU, which requires
+ * shadowed page tables to be associated with a valid memslot, and KVM
+ * can't complete VM-Enter without a valid shadow root.
+ */
+ WARN_ON_ONCE(vmx->nested.nested_run_pending &&
+ vm_exit_reason != EXIT_REASON_TRIPLE_FAULT);
+ vmx->nested.nested_run_pending = false;

if (kvm_check_request(KVM_REQ_GET_NESTED_STATE_PAGES, vcpu)) {
/*

base-commit: 61eb0bb88e6c154a5e19e62edd99299a86a2c6a7
--