[PATCH v11 00/35] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support

From: Michael Roth
Date: Sat Dec 30 2023 - 12:25:47 EST


This patchset is also available at:

https://github.com/amdese/linux/commits/snp-host-v11

and is based on top of the following series:

"[PATCH v1] Add AMD Secure Nested Paging (SEV-SNP) Initialization Support"
https://lore.kernel.org/kvm/20231230161954.569267-1-michael.roth@xxxxxxx/

which in turn is based on linux-next tag next-20231222

The host initialization patches have been split off to a separate series
as noted above to more easily shepherd them into tip/x86, while this series
now focuses on KVM support. Additionally, the gmem RFC[1] that this series
was previously based on is now included for better visibility and to provide
more context. Please see the RFC link however for more context on why the
gmem changes/hooks are implemented in their current form.

[1] https://lore.kernel.org/kvm/20231016115028.996656-1-michael.roth@xxxxxxx/


== OVERVIEW ==

This patchset implements SEV-SNP hypervisor support for linux. It
relies on the SNP host initialization patches noted above, which we
originally included in this series up until v10, but are now posted
separately for inclusion into x86 tree, while this series is targetted
for the x86 KVM tree. Both aggregate of these patchsets are being based
on linux-next to hopefully make it easier coordinate and test against
tip and kvm-next.


== TESTING ==

For testing this via QEMU, use the following tree:

https://github.com/amdese/qemu/commits/snp-v3-wip

SEV-SNP with gmem enabled:

qemu-system-x86_64 -cpu EPYC-Milan-v2 \
-object memory-backend-memfd,id=ram1,size=2G,share=true,prealloc=false,reserve=false \
-object sev-snp-guest,id=sev0,cbitpos=51,reduced-phys-bits=1 \
-machine q35,confidential-guest-support=sev0,memory-backend=ram1 \
...

KVM selftests for guest_memfd / KVM_GENERIC_PRIVATE_MEM:

cd $kernel_src_dir
make -C tools/testing/selftests TARGETS="kvm" EXTRA_CFLAGS="-DDEBUG -I<path to kernel headers>"
sudo tools/testing/selftests/kvm/x86_64/private_mem_conversions_test


== BACKGROUND (SEV-SNP) ==

This part of the Secure Encrypted Paging (SEV-SNP) series focuses on the
changes required to add KVM support for SEV-SNP. The series builds upon
SEV-SNP Guest Support, now part of mainline, and a separate series that
implements basic host initialization requirements for SNP-enabled systems.

This series provides the basic building blocks to support booting the SEV-SNP
VMs, it does not cover all the security enhancement introduced by the SEV-SNP
such as interrupt protection.

The CCP driver is enhanced to provide new APIs that use the SEV-SNP
specific commands defined in the SEV-SNP firmware specification. The KVM
driver uses those APIs to create and managed the SEV-SNP guests.

The GHCB specification version 2 introduces new set of NAE's that is
used by the SEV-SNP guest to communicate with the hypervisor. The series
provides support to handle the following new NAE events:

- Register GHCB GPA
- Page State Change Request
- Hypevisor feature
- Guest message request

When pages are marked as guest-owned in the RMP table, they are assigned
to a specific guest/ASID, as well as a specific GFN with in the guest. Any
attempts to map it in the RMP table to a different guest/ASID, or a
different GFN within a guest/ASID, will result in an RMP nested page fault.

Prior to accessing a guest-owned page, the guest must validate it with a
special PVALIDATE instruction which will set a special bit in the RMP table
for the guest. This is the only way to set the validated bit outisde of the
initial pre-encrypted guest payload/image; any attempts outside the guest to
modify the RMP entry from that point forward will result in the validated
bit being cleared, at which point the guest will trigger an exception if it
attempts to access that page so it can be made aware of possible tampering.

One exception to this is the initial guest payload, which is pre-validated
by the firmware prior to launching. The guest can use Guest Message requests
to fetch an attestation report which will include the measurement of the
initial image so that the guest can verify it was booted with the expected
image/environment.

After boot, guests can use Page State Change requests to switch pages
between shared/hypervisor-owned and private/guest-owned to share data for
things like DMA, virtio buffers, and other GHCB requests.

In this implementation SEV-SNP, private guest memory is managed by a new
kernel framework called guest_memfd (gmem). With gmem, a new
KVM_SET_MEMORY_ATTRIBUTES KVM ioctl has been added to tell the KVM
MMU whether a particular GFN should be backed by shared (normal) memory or
private (gmem-allocated) memory. To tie into this, Page State Change
requests are forward to userspace via KVM_EXIT_VMGEXIT exits, which will
then issue the corresponding KVM_SET_MEMORY_ATTRIBUTES call to set the
private/shared state in the KVM MMU.

The gmem / KVM MMU hooks implemented in this series will then update the RMP
table entries for the backing PFNs to set them to guest-owned/private when
mapping private pages into the guest via KVM MMU, or use the normal KVM MMU
handling in the case of shared pages where the corresponding RMP table
entries are left in the default shared/hypervisor-owned state.

Feedback/review is very much appreciated!

-Mike

Changes since v10:

* Split off host initialization patches to separate series
* Drop SNP_{SET,GET}_EXT_CONFIG SEV ioctls, and drop
KVM_SEV_SNP_{SET,GET}_CERTS KVM ioctls. Instead, all certificate data is
now fetched from uerspace as part of a new KVM_EXIT_VMGEXIT event type.
(Sean, Dionna)
* SNP_SET_EXT_CONFIG is now replaced with a more basic SNP_SET_CONFIG,
which is now just a light wrapper around the SNP_CONFIG firmware command,
and SNP_GET_EXT_CONFIG is now redundant with existing SNP_PLATFORM_STATUS,
so just stick with that interface
* Introduce SNP_SET_CONFIG_{START,END}, which can be used to pause extended
guest requests while reported TCB / certificates are being updated so
the updates are done atomically relative to running guests.
* Improve documentation for KVM_EXIT_VMGEXIT event types and tighten down
the expected input/output for union types rather than exposing GHCB
page/MSR
* Various re-factorings, commit/comments fixups (Boris, Liam, Vlastimil)
* Make CONFIG_KVM_AMD_SEV depend on KVM_GENERIC_PRIVATE_MEM instead of
CONFIG_KVM_SW_PROTECTED_VM (Paolo)
* Include Sean's patch to add hugepage support to gmem, but modify it based
on discussions to be best-effort and not rely on explicit flag

Changes since v9:

* Split off gmem changes to separate RFC series, drop RFC tag from this series
* Use 2M RMPUPDATE instructions whenever possible when invalidating/releasing
gmem pages
* Tighten up RMP #NPF handling to better differentiate spurious cases from
unexpected behavior
* Simplify/optimize logic for determine when 2M NPT private mappings are
possible
* Be more consistent with PFN data types and stub return values (Dave)
* Reduce potential flooding from frequently-printed pr_debug()'s (Dave)
* Use existing #PF handling paths to catch illegal userspace-generated RMP
faults (Dave)
* Improve host kexec/kdump support (Ashish)
* Reduce overhead from unecessary WBINVD via MMU notifiers (Ashish)
* Avoid host crashes during CCP module probe if SNP_INIT* is issued while
guests are running (Tom L.)
* Simplify AutoIBRS disablement (Kim, Dave)
* Avoid unecessary zero'ing in extended guest requests (Alexey)
* Fix padding in struct sev_user_data_ext_snp_config (Alexey)
* Report AP creation failures via GHCB error codes rather than inducing #GP in
guest (Peter)
* Disallow multiple allocations of snp_context via userspace (Peter)
* Error out on unsupported SNP policy bits (Tom)
* Fix snp_leak_pages() stub (Jeremi)
* Use C99 flexible arrays where appropriate
* Use helper to handle HVA->PFN conversions prior to dumping RMP entries (Dave)
* Don't potentially print out all 512 entries when dumping 2MB RMP range (Dave)
* Don't use a union to dump raw RMP entries, just cast at dump-site (Dave)
* Don't use helpers to access RMP entry bitfields, use them directly (Dave)
* Simplify logic and improve comments for AutoIBRS disablement (Dave)

# Changes that were split off to separate gmem series
* Use KVM_X86_SNP_VM to implement SNP-specific checks on whether a fault was
shared/private and drop the duplicate memslot lookup (Isaku, Sean)
* Use Isaku's version of patch to plumb 64-bit #NPF error code (Isaku)
* Fix up stub for kvm_arch_gmem_invalidate() (Boris)

----------------------------------------------------------------
Ashish Kalra (1):
KVM: SEV: Avoid WBINVD for HVA-based MMU notifications for SNP

Brijesh Singh (14):
KVM: x86: Define RMP page fault error bits for #NPF
KVM: SEV: Add GHCB handling for Hypervisor Feature Support requests
KVM: SEV: Add initial SEV-SNP support
KVM: SEV: Add KVM_SNP_INIT command
KVM: SEV: Add KVM_SEV_SNP_LAUNCH_START command
KVM: SEV: Add KVM_SEV_SNP_LAUNCH_UPDATE command
KVM: SEV: Add KVM_SEV_SNP_LAUNCH_FINISH command
KVM: SEV: Add support to handle GHCB GPA register VMGEXIT
KVM: SEV: Add support to handle MSR based Page State Change VMGEXIT
KVM: SEV: Add support to handle Page State Change VMGEXIT
KVM: x86: Export the kvm_zap_gfn_range() for the SNP use
KVM: SEV: Add support to handle RMP nested page faults
KVM: SVM: Add module parameter to enable the SEV-SNP
KVM: SEV: Provide support for SNP_GUEST_REQUEST NAE event

Michael Roth (15):
mm: Introduce AS_INACCESSIBLE for encrypted/confidential memory
KVM: Use AS_INACCESSIBLE when creating guest_memfd inode
KVM: x86: Add gmem hook for initializing memory
KVM: x86: Add gmem hook for invalidating memory
KVM: x86/mmu: Pass around full 64-bit error code for KVM page faults
KVM: x86: Add KVM_X86_SNP_VM vm_type
KVM: x86: Determine shared/private faults based on vm_type
KVM: SEV: Do not intercept accesses to MSR_IA32_XSS for SEV-ES guests
KVM: SEV: Select KVM_GENERIC_PRIVATE_MEM when CONFIG_KVM_AMD_SEV=y
KVM: SEV: Add support for GHCB-based termination requests
KVM: SEV: Implement gmem hook for initializing private pages
KVM: SEV: Implement gmem hook for invalidating private pages
KVM: x86: Add gmem hook for determining max NPT mapping level
crypto: ccp: Add the SNP_SET_CONFIG_{START,END} commands
KVM: SEV: Provide support for SNP_EXTENDED_GUEST_REQUEST NAE event

Sean Christopherson (1):
KVM: Add hugepage support for dedicated guest memory

Tom Lendacky (3):
KVM: SEV: Add support to handle AP reset MSR protocol
KVM: SEV: Use a VMSA physical address variable for populating VMCB
KVM: SEV: Support SEV-SNP AP Creation NAE event

Vishal Annapurve (1):
KVM: Add HVA range operator

Documentation/virt/coco/sev-guest.rst | 33 +-
Documentation/virt/kvm/api.rst | 73 ++
.../virt/kvm/x86/amd-memory-encryption.rst | 103 ++
arch/x86/include/asm/kvm-x86-ops.h | 3 +
arch/x86/include/asm/kvm_host.h | 15 +
arch/x86/include/asm/sev-common.h | 22 +-
arch/x86/include/asm/sev.h | 11 +
arch/x86/include/asm/svm.h | 6 +
arch/x86/include/uapi/asm/kvm.h | 1 +
arch/x86/kvm/Kconfig | 3 +
arch/x86/kvm/mmu.h | 2 -
arch/x86/kvm/mmu/mmu.c | 28 +-
arch/x86/kvm/mmu/mmu_internal.h | 24 +-
arch/x86/kvm/mmu/mmutrace.h | 2 +-
arch/x86/kvm/mmu/paging_tmpl.h | 2 +-
arch/x86/kvm/svm/sev.c | 1362 +++++++++++++++++++-
arch/x86/kvm/svm/svm.c | 38 +-
arch/x86/kvm/svm/svm.h | 40 +-
arch/x86/kvm/x86.c | 44 +-
arch/x86/virt/svm/sev.c | 51 +
drivers/crypto/ccp/sev-dev.c | 44 +
include/linux/kvm_host.h | 24 +
include/linux/pagemap.h | 1 +
include/uapi/linux/kvm.h | 84 ++
include/uapi/linux/psp-sev.h | 12 +
include/uapi/linux/sev-guest.h | 9 +
mm/truncate.c | 3 +-
virt/kvm/Kconfig | 8 +
virt/kvm/guest_memfd.c | 132 +-
virt/kvm/kvm_main.c | 49 +
30 files changed, 2175 insertions(+), 54 deletions(-)