[RFC PATCH 0/8] KVM: Prepopulate guest memory API

From: isaku . yamahata
Date: Fri Mar 01 2024 - 12:29:52 EST


From: Isaku Yamahata <isaku.yamahata@xxxxxxxxx>

The objective of this RFC patch series is to develop a uAPI aimed at
(pre)populating guest memory for various use cases and underlying VM
technologies.

- Pre-populate guest memory to mitigate excessive KVM page faults during guest
boot [1], a need not limited to any specific technology.

- Pre-populating guest memory (including encryption and measurement) for
confidential guests [2]. SEV-SNP, TDX, and SW-PROTECTED VM. Potentially
other technologies and pKVM.

The patches are organized as follows.
- 1: documentation on uAPI KVM_MAP_MEMORY.
- 2: archtechture-independent implementation part.
- 3-4: refactoring of x86 KVM MMU as preparation.
- 5: x86 Helper function to map guest page.
- 6: x86 KVM arch implementation.
- 7: Add x86-ops necessary for TDX and SEV-SNP.
- 8: selftest for validation.

Discussion point:

uAPI design:
- access flags
Access flags are needed for the guest memory population. We have options for
their exposure to uAPI.
- option 1. Introduce access flags, possibly with the addition of a private
access flag.
- option 2. Omit access flags from UAPI.
Allow the kernel to deduce the necessary flag based on the memory slot and
its memory attributes.

- SEV-SNP and byte vs. page size
The SEV correspondence is SEV_LAUNCH_UPDATE_DATA. Which dictates memory
regions to be in 16-byte alignment, not page size. Should we define struct
kvm_memory_mapping in bytes rather than page size?

struct kvm_sev_launch_update_data {
__u64 uaddr;
__u32 len;
};

- TDX and measurement
The TDX correspondence is TDH.MEM.PAGE.ADD and TDH.MR.EXTEND. TDH.MEM.EXTEND
extends its measurement by the page contents.
Option 1. Add an additional flag like KVM_MEMORY_MAPPING_FLAG_EXTEND to issue
TDH.MEM.EXTEND
Option 2. Don't handle extend. Let TDX vendor specific API
KVM_EMMORY_ENCRYPT_OP to handle it with the subcommand like
KVM_TDX_EXTEND_MEMORY.

- TDX and struct kvm_memory_mapping:source
While the current patch series doesn't utilize
kvm_memory_mapping::source member. TDX needs it to specify the source of
memory contents.

Implementation:
- x86 KVM MMU
In x86 KVM MMU, I chose to use kvm_mmu_do_page_fault(). It's not confined to
KVM TDP MMU. We can restrict it to KVM TDP MMU and introduce an optimized
version.

[1] https://lore.kernel.org/all/65262e67-7885-971a-896d-ad9c0a760907@xxxxxxxxx/
[2] https://lore.kernel.org/all/6a4c029af70d41b63bcee3d6a1f0c2377f6eb4bd.1690322424.git.isaku.yamahata@xxxxxxxxx

Thanks,

Isaku Yamahata (8):
KVM: Document KVM_MAP_MEMORY ioctl
KVM: Add KVM_MAP_MEMORY vcpu ioctl to pre-populate guest memory
KVM: x86/mmu: Introduce initialier macro for struct kvm_page_fault
KVM: x86/mmu: Factor out kvm_mmu_do_page_fault()
KVM: x86/mmu: Introduce kvm_mmu_map_page() for prepopulating guest
memory
KVM: x86: Implement kvm_arch_{, pre_}vcpu_map_memory()
KVM: x86: Add hooks in kvm_arch_vcpu_map_memory()
KVM: selftests: x86: Add test for KVM_MAP_MEMORY

Documentation/virt/kvm/api.rst | 36 +++++
arch/x86/include/asm/kvm-x86-ops.h | 2 +
arch/x86/include/asm/kvm_host.h | 6 +
arch/x86/kvm/mmu.h | 3 +
arch/x86/kvm/mmu/mmu.c | 30 ++++
arch/x86/kvm/mmu/mmu_internal.h | 70 +++++----
arch/x86/kvm/x86.c | 83 +++++++++++
include/linux/kvm_host.h | 4 +
include/uapi/linux/kvm.h | 15 ++
tools/include/uapi/linux/kvm.h | 14 ++
tools/testing/selftests/kvm/Makefile | 1 +
.../selftests/kvm/x86_64/map_memory_test.c | 136 ++++++++++++++++++
virt/kvm/kvm_main.c | 74 ++++++++++
13 files changed, 448 insertions(+), 26 deletions(-)
create mode 100644 tools/testing/selftests/kvm/x86_64/map_memory_test.c


base-commit: 6a108bdc49138bcaa4f995ed87681ab9c65122ad
--
2.25.1