Re: [PATCH v10 045/108] KVM: x86/mmu: Add a private pointer to struct kvm_mmu_page

From: Huang, Kai
Date: Wed Nov 16 2022 - 05:46:08 EST


On Sat, 2022-10-29 at 23:22 -0700, isaku.yamahata@xxxxxxxxx wrote:
> From: Isaku Yamahata <isaku.yamahata@xxxxxxxxx>
>
> For private GPA, CPU refers a private page table whose contents are
> encrypted. The dedicated APIs to operate on it (e.g. updating/reading its
> PTE entry) are used and their cost is expensive.
>
> When KVM resolves KVM page fault, it walks the page tables. To reuse the
> existing KVM MMU code and mitigate the heavy cost to directly walk
> protected (encrypted) page table, allocate one more page to copy the
> protected page table for KVM MMU code to directly walk. Resolve KVM page
> fault with the existing code, and do additional operations necessary for
> the protected page table. To distinguish such cases, the existing KVM page
> table is called a shared page table (i.e. not associated with protected
> page table), and the page table with protected page table is called a
> private page table. The relationship is depicted below.
>
> Add a private pointer to struct kvm_mmu_page for protected page table and
> add helper functions to allocate/initialize/free a protected page table
> page.
>
> KVM page fault |
> | |
> V |
> -------------+---------- |
> | | |
> V V |
> shared GPA private GPA |
> | | |
> V V |
> shared PT root private PT root | protected PT root
> | | | |
> V V | V
> shared PT private PT ----propagate----> protected PT
> | | | |
> | \-----------------+------\ |
> | | | |
> V | V V
> shared guest page | private guest page
> |
> non-encrypted memory | encrypted memory
> |
> PT: page table
> - Shared PT is visible to KVM and it is used by CPU.
> - Protected PT is used by CPU but it is invisible to KVM.
> - Private PT is visible to KVM but not used by CPU. It is used to
> propagate PT change to the actual protected PT which is used by CPU.
>
> Signed-off-by: Isaku Yamahata <isaku.yamahata@xxxxxxxxx>
> ---
> arch/x86/include/asm/kvm_host.h | 7 +++
> arch/x86/kvm/mmu/mmu.c | 8 +++
> arch/x86/kvm/mmu/mmu_internal.h | 90 +++++++++++++++++++++++++++++++--
> arch/x86/kvm/mmu/tdp_mmu.c | 1 +
> 4 files changed, 102 insertions(+), 4 deletions(-)
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index ee01add57a6b..381df2c8136d 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -754,6 +754,13 @@ struct kvm_vcpu_arch {
> struct kvm_mmu_memory_cache mmu_shadow_page_cache;
> struct kvm_mmu_memory_cache mmu_shadowed_info_cache;
> struct kvm_mmu_memory_cache mmu_page_header_cache;
> + /*
> + * This cache is to allocate private page table. E.g. Secure-EPT used
> + * by the TDX module. Because the TDX module doesn't trust VMM and
> + * initializes the pages itself, KVM doesn't initialize them. Allocate
> + * pages with garbage and give them to the TDX module.
> + */

Perhaps put "Because ..." part to kvm_mmu_create() so people can see why there's
no initialization for mmu_private_spte_cache?

> + struct kvm_mmu_memory_cache mmu_private_spt_cache;

Nit:

In changelog you mentioned the page table used by CPU is actually "protected
PT". IIUC the sp->private_spt you allocated from this cache is the one that is
actually used by CPU. So looks there's some inconsistency here.


[...]


> +
> +static inline void kvm_mmu_alloc_private_spt(struct kvm_vcpu *vcpu,
> + struct kvm_mmu_memory_cache *private_spt_cache,
> + struct kvm_mmu_page *sp)

This function is very weird in the context of this patch. _Only_ a new vcpu-
scope 'mmu_private_spte_cache' is added in this patch, but here you allow caller
to pass an additional argument of private_spt_cache. So there must be another
cache which is not introduced in this patch?

> +{
> + /*
> + * vcpu == NULL means non-root SPT:
> + * vcpu == NULL is used to split a large SPT into smaller SPT. Root SPT
> + * is not a large SPT.

I am guessing this "vcpu == NULL" case is for "Eager Splitting"?

If so, why not adding a global MMU cache for private_spt allocation, and make
vcpu->arch.mmu_private_spt_cache point to the global cache? In this case, in
the context where you only have 'kvm', you can use the global cache directly.
And in the context where you have a 'vcpu', you just use vcpu's cache.

> + */
> + bool is_root = vcpu &&
> + vcpu->arch.root_mmu.root_role.level == sp->role.level;
> +
> + if (vcpu)
> + private_spt_cache = &vcpu->arch.mmu_private_spt_cache;
> + KVM_BUG_ON(!kvm_mmu_page_role_is_private(sp->role), vcpu->kvm);
> + if (is_root)
> + /*
> + * Because TDX module assigns root Secure-EPT page and set it to
> + * Secure-EPTP when TD vcpu is created, secure page table for
> + * root isn't needed.
> + */
> + sp->private_spt = NULL;
> + else {
> + sp->private_spt = kvm_mmu_memory_cache_alloc(private_spt_cache);
> + /*
> + * Because mmu_private_spt_cache is topped up before staring kvm
> + * page fault resolving, the allocation above shouldn't fail.
> + */
> + WARN_ON_ONCE(!sp->private_spt);
> + }
> +}
> +