Re: [RFC PATCH 0/8] KVM: Prepopulate guest memory API

From: Michael Roth
Date: Sun Mar 10 2024 - 23:22:00 EST


On Fri, Mar 01, 2024 at 09:28:42AM -0800, isaku.yamahata@xxxxxxxxx wrote:
> From: Isaku Yamahata <isaku.yamahata@xxxxxxxxx>
>
> The objective of this RFC patch series is to develop a uAPI aimed at
> (pre)populating guest memory for various use cases and underlying VM
> technologies.
>
> - Pre-populate guest memory to mitigate excessive KVM page faults during guest
> boot [1], a need not limited to any specific technology.
>
> - Pre-populating guest memory (including encryption and measurement) for
> confidential guests [2]. SEV-SNP, TDX, and SW-PROTECTED VM. Potentially
> other technologies and pKVM.
>
> The patches are organized as follows.
> - 1: documentation on uAPI KVM_MAP_MEMORY.
> - 2: archtechture-independent implementation part.
> - 3-4: refactoring of x86 KVM MMU as preparation.
> - 5: x86 Helper function to map guest page.
> - 6: x86 KVM arch implementation.
> - 7: Add x86-ops necessary for TDX and SEV-SNP.
> - 8: selftest for validation.
>
> Discussion point:
>
> uAPI design:
> - access flags
> Access flags are needed for the guest memory population. We have options for
> their exposure to uAPI.
> - option 1. Introduce access flags, possibly with the addition of a private
> access flag.
> - option 2. Omit access flags from UAPI.
> Allow the kernel to deduce the necessary flag based on the memory slot and
> its memory attributes.
>
> - SEV-SNP and byte vs. page size
> The SEV correspondence is SEV_LAUNCH_UPDATE_DATA. Which dictates memory
> regions to be in 16-byte alignment, not page size. Should we define struct
> kvm_memory_mapping in bytes rather than page size?

For SNP it's multiples of page size only, and I'm not sure it's worth
trying to work in legacy SEV support, since that pull in other
requirements like how the backing memory also needs to be pre-pinned via
a separate KVM ioctl that would need to be documented as an SEV-specific
requirement for this interface... it just doesn't seem worth it. And SEV
would still benefit from the more basic functionality of pre-mapping pages
just like any other guest so it still benefits in that regard.

That said, it would be a shame if we needed to clutter up the API as
soon as some user can along that required non-page-sized values. That
seems unlikely for the pre-mapping use case, but for CoCo maybe it's
worth checking if pKVM would have any requirements like that? Or just
going with byte-size parameters to play it safe?

>
> struct kvm_sev_launch_update_data {
> __u64 uaddr;
> __u32 len;
> };
>
> - TDX and measurement
> The TDX correspondence is TDH.MEM.PAGE.ADD and TDH.MR.EXTEND. TDH.MEM.EXTEND
> extends its measurement by the page contents.
> Option 1. Add an additional flag like KVM_MEMORY_MAPPING_FLAG_EXTEND to issue
> TDH.MEM.EXTEND
> Option 2. Don't handle extend. Let TDX vendor specific API
> KVM_EMMORY_ENCRYPT_OP to handle it with the subcommand like
> KVM_TDX_EXTEND_MEMORY.

For SNP this happens unconditionally via SNP_LAUNCH_UPDATE, and with some
additional measurements via SNP_LAUNCH_FINISH, and down the road when live
migration support is added that flow will be a bit different. So
personally I think it's better to leave separate for now. Maybe down the
road some common measurement/finalize interface can deprecate some of
the other MEMORY_ENCRYPT ioctls.

Even with this more narrowly-defined purpose it's really unfortunate we
have to bake any CoCo stuff into this interface at all. It would be great
if all it did was simply pre-map entries into nested page tables to boost
boot performance, and we had some separate CoCo-specific API to handle
intial loading/encryption of guest data. I understand with SecureEPT
considerations why we sort of need it here for TDX, but it already ends
up being awkward for the SNP_LAUNCH_UPDATE use-case because there's not
really any good reason for that handling to live inside KVM MMU hooks
like with TDX, so we'll probably end up putting it in a pre/post hook
where all the handling is completely separate from TDX flow and in the
process complicating the userspace API to with the additional parameters
needed for that even though other guest types are likely to never need
them.

(Alternatively we can handle SNP_LAUNCH_UPDATE via KVM MMU hooks like with
tdx_mem_page_add(), but then we'll need to plumb additional parameters
down into the KVM MMU code for stuff like the SNP page type. But maybe
it's worth it to have some level of commonality for x86 at least?)

But I'd be hesitant to bake more requirements into this pre-mapping
interface, it feels like we're already overloading it as is.

>
> - TDX and struct kvm_memory_mapping:source
> While the current patch series doesn't utilize
> kvm_memory_mapping::source member. TDX needs it to specify the source of
> memory contents.

SNP will need it too FWIW.

-Mike

>
> Implementation:
> - x86 KVM MMU
> In x86 KVM MMU, I chose to use kvm_mmu_do_page_fault(). It's not confined to
> KVM TDP MMU. We can restrict it to KVM TDP MMU and introduce an optimized
> version.
>
> [1] https://lore.kernel.org/all/65262e67-7885-971a-896d-ad9c0a760907@xxxxxxxxx/
> [2] https://lore.kernel.org/all/6a4c029af70d41b63bcee3d6a1f0c2377f6eb4bd.1690322424.git.isaku.yamahata@xxxxxxxxx
>
> Thanks,
>
> Isaku Yamahata (8):
> KVM: Document KVM_MAP_MEMORY ioctl
> KVM: Add KVM_MAP_MEMORY vcpu ioctl to pre-populate guest memory
> KVM: x86/mmu: Introduce initialier macro for struct kvm_page_fault
> KVM: x86/mmu: Factor out kvm_mmu_do_page_fault()
> KVM: x86/mmu: Introduce kvm_mmu_map_page() for prepopulating guest
> memory
> KVM: x86: Implement kvm_arch_{, pre_}vcpu_map_memory()
> KVM: x86: Add hooks in kvm_arch_vcpu_map_memory()
> KVM: selftests: x86: Add test for KVM_MAP_MEMORY
>
> Documentation/virt/kvm/api.rst | 36 +++++
> arch/x86/include/asm/kvm-x86-ops.h | 2 +
> arch/x86/include/asm/kvm_host.h | 6 +
> arch/x86/kvm/mmu.h | 3 +
> arch/x86/kvm/mmu/mmu.c | 30 ++++
> arch/x86/kvm/mmu/mmu_internal.h | 70 +++++----
> arch/x86/kvm/x86.c | 83 +++++++++++
> include/linux/kvm_host.h | 4 +
> include/uapi/linux/kvm.h | 15 ++
> tools/include/uapi/linux/kvm.h | 14 ++
> tools/testing/selftests/kvm/Makefile | 1 +
> .../selftests/kvm/x86_64/map_memory_test.c | 136 ++++++++++++++++++
> virt/kvm/kvm_main.c | 74 ++++++++++
> 13 files changed, 448 insertions(+), 26 deletions(-)
> create mode 100644 tools/testing/selftests/kvm/x86_64/map_memory_test.c
>
>
> base-commit: 6a108bdc49138bcaa4f995ed87681ab9c65122ad
> --
> 2.25.1
>
>