Re: [PATCH v17 059/116] KVM: TDX: Create initial guest memory

From: Binbin Wu
Date: Thu Nov 16 2023 - 01:35:45 EST




On 11/7/2023 10:56 PM, isaku.yamahata@xxxxxxxxx wrote:
From: Isaku Yamahata <isaku.yamahata@xxxxxxxxx>

Because the guest memory is protected in TDX, the creation of the initial
guest memory requires a dedicated TDX module API, tdh_mem_page_add, instead
of directly copying the memory contents into the guest memory in the case
of the default VM type. KVM MMU page fault handler callback,
private_page_add, handles it.

Define new subcommand, KVM_TDX_INIT_MEM_REGION, of VM-scoped
KVM_MEMORY_ENCRYPT_OP. It assigns the guest page, copies the initial
memory contents into the guest memory, encrypts the guest memory. At the
same time, optionally it extends memory measurement of the TDX guest. It
calls the KVM MMU page fault(EPT-violation) handler to trigger the
callbacks for it.

Reported-by: gkirkpatrick@xxxxxxxxxx
Signed-off-by: Isaku Yamahata <isaku.yamahata@xxxxxxxxx>

---
v15 -> v16:
- add check if nr_pages isn't large with
(nr_page << PAGE_SHIFT) >> PAGE_SHIFT

v14 -> v15:
- add a check if TD is finalized or not to tdx_init_mem_region()
- return -EAGAIN when partial population
---
arch/x86/include/uapi/asm/kvm.h | 9 ++
arch/x86/kvm/mmu/mmu.c | 1 +
arch/x86/kvm/vmx/tdx.c | 167 +++++++++++++++++++++++++-
arch/x86/kvm/vmx/tdx.h | 2 +
tools/arch/x86/include/uapi/asm/kvm.h | 9 ++
5 files changed, 185 insertions(+), 3 deletions(-)

[...]
+static int tdx_sept_page_add(struct kvm *kvm, gfn_t gfn,
+ enum pg_level level, kvm_pfn_t pfn)

For me, the function name is a bit confusing.
I would relate it to a SEPT table page instead of a normal private page if only by the function name.

Similar to tdx_sept_page_aug(), though it's less confusing due to there is no seam call to aug a sept table page.


+{
+ struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm);
+ hpa_t hpa = pfn_to_hpa(pfn);
+ gpa_t gpa = gfn_to_gpa(gfn);
+ struct tdx_module_args out;
+ hpa_t source_pa;
+ bool measure;
+ u64 err;
+
+ /*
+ * KVM_INIT_MEM_REGION, tdx_init_mem_region(), supports only 4K page
+ * because tdh_mem_page_add() supports only 4K page.
+ */
+ if (KVM_BUG_ON(level != PG_LEVEL_4K, kvm))
+ return -EINVAL;
+
+ /*
+ * In case of TDP MMU, fault handler can run concurrently. Note
+ * 'source_pa' is a TD scope variable, meaning if there are multiple
+ * threads reaching here with all needing to access 'source_pa', it
+ * will break. However fortunately this won't happen, because below
+ * TDH_MEM_PAGE_ADD code path is only used when VM is being created
+ * before it is running, using KVM_TDX_INIT_MEM_REGION ioctl (which
+ * always uses vcpu 0's page table and protected by vcpu->mutex).
+ */
+ if (KVM_BUG_ON(kvm_tdx->source_pa == INVALID_PAGE, kvm)) {
+ tdx_unpin(kvm, pfn);
+ return -EINVAL;
+ }
+
+ source_pa = kvm_tdx->source_pa & ~KVM_TDX_MEASURE_MEMORY_REGION;
+ measure = kvm_tdx->source_pa & KVM_TDX_MEASURE_MEMORY_REGION;
+ kvm_tdx->source_pa = INVALID_PAGE;
+
+ do {
+ err = tdh_mem_page_add(kvm_tdx->tdr_pa, gpa, hpa, source_pa,
+ &out);
+ /*
+ * This path is executed during populating initial guest memory
+ * image. i.e. before running any vcpu. Race is rare.
+ */
+ } while (unlikely(err == TDX_ERROR_SEPT_BUSY));
+ if (KVM_BUG_ON(err, kvm)) {
+ pr_tdx_error(TDH_MEM_PAGE_ADD, err, &out);
+ tdx_unpin(kvm, pfn);
+ return -EIO;
+ } else if (measure)
+ tdx_measure_page(kvm_tdx, gpa);
+
+ return 0;
+
+}
+
[...]