Re: [PATCH v7 044/102] KVM: x86/mmu: Add a private pointer to struct kvm_mmu_page

From: Isaku Yamahata
Date: Tue Aug 09 2022 - 19:51:09 EST


On Thu, Jul 28, 2022 at 01:13:35PM -0700,
David Matlack <dmatlack@xxxxxxxxxx> wrote:

> On Mon, Jun 27, 2022 at 02:53:36PM -0700, isaku.yamahata@xxxxxxxxx wrote:
> > From: Isaku Yamahata <isaku.yamahata@xxxxxxxxx>
> >
> > For private GPA, CPU refers a private page table whose contents are
> > encrypted. The dedicated APIs to operate on it (e.g. updating/reading its
> > PTE entry) are used and their cost is expensive.
> >
> > When KVM resolves KVM page fault, it walks the page tables. To reuse the
> > existing KVM MMU code and mitigate the heavy cost to directly walk
> > encrypted private page table, allocate a more page to mirror the existing
> > KVM page table. Resolve KVM page fault with the existing code, and do
> > additional operations necessary for the mirrored private page table. To
> > distinguish such cases, the existing KVM page table is called a shared page
> > table (i.e. no mirrored private page table), and the KVM page table with
> > mirrored private page table is called a private page table. The
> > relationship is depicted below.
> >
> > Add private pointer to struct kvm_mmu_page for mirrored private page table
> > and add helper functions to allocate/initialize/free a mirrored private
> > page table page. Also, add helper functions to check if a given
> > kvm_mmu_page is private. The later patch introduces hooks to operate on
> > the mirrored private page table.
> >
> > KVM page fault |
> > | |
> > V |
> > -------------+---------- |
> > | | |
> > V V |
> > shared GPA private GPA |
> > | | |
> > V V |
> > CPU/KVM shared PT root KVM private PT root | CPU private PT root
> > | | | |
> > V V | V
> > shared PT private PT <----mirror----> mirrored private PT
> > | | | |
> > | \-----------------+------\ |
> > | | | |
> > V | V V
> > shared guest page | private guest page
> > |
> > non-encrypted memory | encrypted memory
> > |
> > PT: page table
> >
> > Both CPU and KVM refer to CPU/KVM shared page table. Private page table
> > is used only by KVM. CPU refers to mirrored private page table.
> >
> > Signed-off-by: Isaku Yamahata <isaku.yamahata@xxxxxxxxx>
> > ---
> > arch/x86/include/asm/kvm_host.h | 1 +
> > arch/x86/kvm/mmu/mmu.c | 9 ++++
> > arch/x86/kvm/mmu/mmu_internal.h | 84 +++++++++++++++++++++++++++++++++
> > arch/x86/kvm/mmu/tdp_mmu.c | 3 ++
> > 4 files changed, 97 insertions(+)
> >
> > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > index f4d4ed41641b..bfc934dc9a33 100644
> > --- a/arch/x86/include/asm/kvm_host.h
> > +++ b/arch/x86/include/asm/kvm_host.h
> > @@ -716,6 +716,7 @@ struct kvm_vcpu_arch {
> > struct kvm_mmu_memory_cache mmu_shadow_page_cache;
> > struct kvm_mmu_memory_cache mmu_gfn_array_cache;
> > struct kvm_mmu_memory_cache mmu_page_header_cache;
> > + struct kvm_mmu_memory_cache mmu_private_sp_cache;
> >
> > /*
> > * QEMU userspace and the guest each have their own FPU state.
> > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> > index c517c7bca105..a5bf3e40e209 100644
> > --- a/arch/x86/kvm/mmu/mmu.c
> > +++ b/arch/x86/kvm/mmu/mmu.c
> > @@ -691,6 +691,13 @@ static int mmu_topup_shadow_page_cache(struct kvm_vcpu *vcpu)
> > int start, end, i, r;
> > bool is_tdp_mmu = is_tdp_mmu_enabled(vcpu->kvm);
> >
> > + if (kvm_gfn_shared_mask(vcpu->kvm)) {
> > + r = kvm_mmu_topup_memory_cache(&vcpu->arch.mmu_private_sp_cache,
> > + PT64_ROOT_MAX_LEVEL);
> > + if (r)
> > + return r;
> > + }
> > +
> > if (is_tdp_mmu && shadow_nonpresent_value)
> > start = kvm_mmu_memory_cache_nr_free_objects(mc);
> >
> > @@ -732,6 +739,7 @@ static void mmu_free_memory_caches(struct kvm_vcpu *vcpu)
> > {
> > kvm_mmu_free_memory_cache(&vcpu->arch.mmu_pte_list_desc_cache);
> > kvm_mmu_free_memory_cache(&vcpu->arch.mmu_shadow_page_cache);
> > + kvm_mmu_free_memory_cache(&vcpu->arch.mmu_private_sp_cache);
> > kvm_mmu_free_memory_cache(&vcpu->arch.mmu_gfn_array_cache);
> > kvm_mmu_free_memory_cache(&vcpu->arch.mmu_page_header_cache);
> > }
> > @@ -1736,6 +1744,7 @@ static struct kvm_mmu_page *kvm_mmu_alloc_page(struct kvm_vcpu *vcpu, int direct
> > if (!direct)
> > sp->gfns = kvm_mmu_memory_cache_alloc(&vcpu->arch.mmu_gfn_array_cache);
> > set_page_private(virt_to_page(sp->spt), (unsigned long)sp);
> > + kvm_mmu_init_private_sp(sp, NULL);
> >
> > /*
> > * active_mmu_pages must be a FIFO list, as kvm_zap_obsolete_pages()
> > diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
> > index 44a04fad4bed..9f3a6bea60a3 100644
> > --- a/arch/x86/kvm/mmu/mmu_internal.h
> > +++ b/arch/x86/kvm/mmu/mmu_internal.h
> > @@ -55,6 +55,10 @@ struct kvm_mmu_page {
> > u64 *spt;
> > /* hold the gfn of each spte inside spt */
> > gfn_t *gfns;
> > +#ifdef CONFIG_KVM_MMU_PRIVATE
> > + /* associated private shadow page, e.g. SEPT page. */
> > + void *private_sp;
> > +#endif
>
> write_flooding_count and unsync_children are only used in shadow MMU SPs
> and private_sp is only used in TDP MMU SPs. So it seems like we could
> put these together in a union and drop CONFIG_KVM_MMU_PRIVATE without
> increasing the size of kvm_mmu_page. i.e.

I introduced KVM_MMU_PRIVATE as a alias to INTEL_TDX_HOST because I don't want
to use it in kvm/mmu and I'd like KVM_MMU_PRIVATE (a sort of) independent from
INTEL_TDX_HOST. Anyway once the patch series is merged, we can drop
KVM_MMU_PRIVATE.


> union {
> struct {
> unsigned int unsync_children;
> /* Number of writes since the last time traversal visited this page. */
> atomic_t write_flooding_count;
> };
> /*
> * The associated private shadow page table, e.g. for Secure EPT.
> * Only valid if tdp_mmu_page is true.
> */
> void *private_spt;
> };
>
> Then change is_private_sp() to:
>
> static inline bool is_private_sp(struct kvm_mmu_page *sp)
> {
> return sp->tdp_mmu_page && sp->private_sp;
> }
>
> This will allow us to drop CONFIG_KVM_MMU_PRIVATE, the only benefit of
> which I see is to avoid increasing the size of kvm_mmu_page. However
> to actually realize that benefit Cloud vendors (for example) would have
> to create separate kernel builds for TDX and non-TDX hosts, which seems
> like a huge hassel.

Good idea. I'll use union.
--
Isaku Yamahata <isaku.yamahata@xxxxxxxxx>