Re: [PATCH v10 049/108] KVM: x86/tdp_mmu: Support TDX private mapping for TDP MMU

From: Binbin Wu
Date: Tue Nov 08 2022 - 08:42:07 EST



On 2022/10/30 14:22, isaku.yamahata@xxxxxxxxx wrote:
From: Isaku Yamahata<isaku.yamahata@xxxxxxxxx>

Allocate protected page table for private page table, and add hooks to
operate on protected page table. This patch adds allocation/free of
protected page tables and hooks. When calling hooks to update SPTE entry,
freeze the entry, call hooks and unfree

unfreeze


the entry to allow concurrent
updates on page tables. Which is the advantage of TDP MMU. As
kvm_gfn_shared_mask() returns false always, those hooks aren't called yet
with this patch.

When the faulting GPA is private, the KVM fault is called private. When
resolving private KVM,

private KVM fault?


allocate protected page table and call hooks to
operate on protected page table. On the change of the private PTE entry,
invoke kvm_x86_ops hook in __handle_changed_spte() to propagate the change
to protected page table. The following depicts the relationship.

private KVM page fault |
| |
V |
private GPA | CPU protected EPTP
| | |
V | V
private PT root | protected PT root
| | |
V | V
private PT --hook to propagate-->protected PT
| | |
\--------------------+------\ |
| | |
| V V
| private guest page
|
|
non-encrypted memory | encrypted memory
|
PT: page table

The existing KVM TDP MMU code uses atomic update of SPTE. On populating
the EPT entry, atomically set the entry. However, it requires TLB
shootdown to zap SPTE. To address it, the entry is frozen with the special
SPTE value that clears the present bit. After the TLB shootdown, the entry
is set to the eventual value (unfreeze).

For protected page table, hooks are called to update protected page table
in addition to direct access to the private SPTE. For the zapping case, it
works to freeze the SPTE. It can call hooks in addition to TLB shootdown.
For populating the private SPTE entry, there can be a race condition
without further protection

vcpu 1: populating 2M private SPTE
vcpu 2: populating 4K private SPTE
vcpu 2: TDX SEAMCALL to update 4K protected SPTE => error
vcpu 1: TDX SEAMCALL to update 2M protected SPTE

To avoid the race, the frozen SPTE is utilized. Instead of atomic update
of the private entry, freeze the entry, call the hook that update protected
SPTE, set the entry to the final value.

Support 4K page only at this stage. 2M page support can be done in future
patches.

Co-developed-by: Kai Huang<kai.huang@xxxxxxxxx>
Signed-off-by: Kai Huang<kai.huang@xxxxxxxxx>
Signed-off-by: Isaku Yamahata<isaku.yamahata@xxxxxxxxx>
---
arch/x86/include/asm/kvm-x86-ops.h | 5 +
arch/x86/include/asm/kvm_host.h | 11 ++
arch/x86/kvm/mmu/mmu.c | 15 +-
arch/x86/kvm/mmu/mmu_internal.h | 32 ++++
arch/x86/kvm/mmu/tdp_iter.h | 2 +-
arch/x86/kvm/mmu/tdp_mmu.c | 244 +++++++++++++++++++++++++----
arch/x86/kvm/mmu/tdp_mmu.h | 2 +-
virt/kvm/kvm_main.c | 1 +
8 files changed, 280 insertions(+), 32 deletions(-)

diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
index f28c9fd72ac4..1b01dc2098b0 100644
--- a/arch/x86/include/asm/kvm-x86-ops.h
+++ b/arch/x86/include/asm/kvm-x86-ops.h
@@ -94,6 +94,11 @@ KVM_X86_OP_OPTIONAL_RET0(set_tss_addr)
KVM_X86_OP_OPTIONAL_RET0(set_identity_map_addr)
KVM_X86_OP_OPTIONAL_RET0(get_mt_mask)
KVM_X86_OP(load_mmu_pgd)
+KVM_X86_OP_OPTIONAL(link_private_spt)
+KVM_X86_OP_OPTIONAL(free_private_spt)
+KVM_X86_OP_OPTIONAL(set_private_spte)
+KVM_X86_OP_OPTIONAL(remove_private_spte)
+KVM_X86_OP_OPTIONAL(zap_private_spte)
KVM_X86_OP(has_wbinvd_exit)
KVM_X86_OP(get_l2_tsc_offset)
KVM_X86_OP(get_l2_tsc_multiplier)

@@ -509,9 +524,81 @@ static void handle_removed_pt(struct kvm *kvm, tdp_ptep_t pt, bool shared)
WARN_ON_ONCE(ret);
}
+ if (is_private_sp(sp) &&
+ WARN_ON(static_call(kvm_x86_free_private_spt)(kvm, sp->gfn, sp->role.level,
+ kvm_mmu_private_spt(sp)))) {
+ /*
+ * Failed to unlink Secure EPT page and there is nothing to do
+ * further. Intentionally leak the page to prevent the kernel
+ * from accessing the encrypted page.
+ */
+ kvm_mmu_init_private_spt(sp, NULL);

Do you think is it better to add some statistics for the intentinal leakage?


+ }
+
call_rcu(&sp->rcu_head, tdp_mmu_free_sp_rcu_callback);
}
pu, until a matching vcpu_put()