Re: [PATCH v10 047/108] KVM: x86/tdp_mmu: Don't zap private pages for unsupported cases

From: Ackerley Tng
Date: Tue Nov 22 2022 - 16:26:18 EST


From: Sean Christopherson <sean.j.christopherson@xxxxxxxxx>

TDX supports only write-back(WB) memory type for private memory
architecturally so that (virtualized) memory type change doesn't make sense
for private memory. Also currently, page migration isn't supported for TDX
yet. (TDX architecturally supports page migration. it's KVM and kernel
implementation issue.)

Regarding memory type change (mtrr virtualization and lapic page mapping
change), pages are zapped by kvm_zap_gfn_range(). On the next KVM page
fault, the SPTE entry with a new memory type for the page is populated.
Regarding page migration, pages are zapped by the mmu notifier. On the next
KVM page fault, the new migrated page is populated. Don't zap private
pages on unmapping for those two cases.

When deleting/moving a KVM memory slot, zap private pages. Typically
tearing down VM. Don't invalidate private page tables. i.e. zap only leaf
SPTEs for KVM mmu that has a shared bit mask. The existing
kvm_tdp_mmu_invalidate_all_roots() depends on role.invalid with read-lock
of mmu_lock so that other vcpu can operate on KVM mmu concurrently. It
marks the root page table invalid and zaps SPTEs of the root page
tables. The TDX module doesn't allow to unlink a protected root page table
from the hardware and then allocate a new one for it. i.e. replacing a
protected root page table. Instead, zap only leaf SPTEs for KVM mmu with a
shared bit mask set.

Signed-off-by: Sean Christopherson <sean.j.christopherson@xxxxxxxxx>
Signed-off-by: Isaku Yamahata <isaku.yamahata@xxxxxxxxx>
---
arch/x86/kvm/mmu/mmu.c | 85 ++++++++++++++++++++++++++++++++++++--
arch/x86/kvm/mmu/tdp_mmu.c | 24 ++++++++---
arch/x86/kvm/mmu/tdp_mmu.h | 5 ++-
3 files changed, 103 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index faf69774c7ce..0237e143299c 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -1577,8 +1577,38 @@ bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range)
if (kvm_memslots_have_rmaps(kvm))
flush = kvm_handle_gfn_range(kvm, range, kvm_zap_rmap);

- if (is_tdp_mmu_enabled(kvm))
- flush = kvm_tdp_mmu_unmap_gfn_range(kvm, range, flush);
+ if (is_tdp_mmu_enabled(kvm)) {
+ bool zap_private;

We should initialize zap_private to true, otherwise zap_private is
uninitialized in call

kvm_tdp_mmu_unmap_gfn_range(kvm, range, flush, zap_private)

if the condition

if (kvm_slot_can_be_private(range->slot)) {

evaluates to false.

+
+ if (kvm_slot_can_be_private(range->slot)) {
+ if (range->flags & KVM_GFN_RANGE_FLAGS_RESTRICTED_MEM)
+ /*
+ * For private slot, the callback is triggered
+ * via falloc. Mode can be allocation or punch
+ * hole. Because the private-shared conversion
+ * is done via
+ * KVM_MEMORY_ENCRYPT_REG/UNREG_REGION, we can
+ * ignore the request from restrictedmem.
+ */
+ return flush;
+ else if (range->flags & KVM_GFN_RANGE_FLAGS_SET_MEM_ATTR) {
+ if (range->attr == KVM_MEM_ATTR_SHARED)
+ zap_private = true;
+ else {
+ WARN_ON_ONCE(range->attr != KVM_MEM_ATTR_PRIVATE);
+ zap_private = false;
+ }
+ } else
+ /*
+ * kvm_unmap_gfn_range() is called via mmu
+ * notifier. For now page migration for private
+ * page isn't supported yet, don't zap private
+ * pages.
+ */
+ zap_private = false;
+ }
+ flush = kvm_tdp_mmu_unmap_gfn_range(kvm, range, flush, zap_private);
+ }

return flush;
}