[PATCH v10 00/50] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support

From: Michael Roth
Date: Mon Oct 16 2023 - 09:30:03 EST


This patchset is also available at:

https://github.com/amdese/linux/commits/snp-host-v10

and is based on top of the following series:

"[PATCH RFC gmem v1 0/8] KVM: gmem hooks/changes needed for x86 (other archs?)"
https://lore.kernel.org/kvm/20231016115028.996656-1-michael.roth@xxxxxxx/

which in turn is based on the KVM-x86 staging tree for guest_memfd:

https://github.com/kvm-x86/linux/commits/guest_memfd


== OVERVIEW ==

This patchset implements SEV-SNP hypervisor support for linux. It
relies on the gmem changes noted above, which are still in an RFC
state, but other than those aspects, the series is being targeted for
inclusion in the KVM x86 tree to support running SEV-SNP guests on AMD
EPYC systems utilizing Zen 3 and newer microarchitectures.

More details on what SEV-SNP is and how it works are available below
under "BACKGROUND".


== PATCH LAYOUT ==

PATCH 01-02: Dependencies for patch #3 that are already upstream but not in
current guest_memfd staging tree
PATCH 03 : General SEV-ES fix for MSR_IA32_XSS interception that fixes a
minor bug for SEV-ES, but a more severe one for SNP guests.
Planning to also submit this separately as an SEV-ES fix.
PATCH 04-19: Host SNP initialization code and CCP driver prep for handling
SNP cmds
PATCH 20-43: general SNP enablement for KVM and CCP driver
PATCH 47-50: misc handling for IOMMU support, guest request handling, debug
infrastructure, and kdump-related handling.


== TESTING ==

For testing this via QEMU, use the following tree:

https://github.com/amdese/qemu/commits/snp-latest-gmem-v12

SEV-SNP with gmem enabled:

# set discard=none to disable discarding memory post-conversion, faster
# boot times, but increased memory usage
qemu-system-x86_64 -cpu EPYC-Milan-v2 \
-object memory-backend-memfd-private,id=ram1,size=2G,share=true \
-object sev-snp-guest,id=sev0,cbitpos=51,reduced-phys-bits=1,discard=both \
-machine q35,confidential-guest-support=sev0,memory-backend=ram1,kvm-type=protected \
...

KVM selftests for UPM:

cd $kernel_src_dir
make -C tools/testing/selftests TARGETS="kvm" EXTRA_CFLAGS="-DDEBUG -I<path to kernel headers>"
sudo tools/testing/selftests/kvm/x86_64/private_mem_conversions_test


== BACKGROUND (SEV-SNP) ==

This part of the Secure Encrypted Paging (SEV-SNP) series focuses on the
changes required in a host OS for SEV-SNP support. The series builds upon
SEV-SNP Guest Support now part of mainline.

This series provides the basic building blocks to support booting the SEV-SNP
VMs, it does not cover all the security enhancement introduced by the SEV-SNP
such as interrupt protection.

The CCP driver is enhanced to provide new APIs that use the SEV-SNP
specific commands defined in the SEV-SNP firmware specification. The KVM
driver uses those APIs to create and managed the SEV-SNP guests.

The GHCB specification version 2 introduces new set of NAE's that is
used by the SEV-SNP guest to communicate with the hypervisor. The series
provides support to handle the following new NAE events:

- Register GHCB GPA
- Page State Change Request
- Hypevisor feature
- Guest message request

When pages are marked as guest-owned in the RMP table, they are assigned
to a specific guest/ASID, as well as a specific GFN with in the guest. Any
attempts to map it in the RMP table to a different guest/ASID, or a
different GFN within a guest/ASID, will result in an RMP nested page fault.

Prior to accessing a guest-owned page, the guest must validate it with a
special PVALIDATE instruction which will set a special bit in the RMP table
for the guest. This is the only way to set the validated bit outisde of the
initial pre-encrypted guest payload/image; any attempts outside the guest to
modify the RMP entry from that point forward will result in the validated
bit being cleared, at which point the guest will trigger an exception if it
attempts to access that page so it can be made aware of possible tampering.

One exception to this is the initial guest payload, which is pre-validated
by the firmware prior to launching. The guest can use Guest Message requests
to fetch an attestation report which will include the measurement of the
initial image so that the guest can verify it was booted with the expected
image/environment.

After boot, guests can use Page State Change requests to switch pages
between shared/hypervisor-owned and private/guest-owned to share data for
things like DMA, virtio buffers, and other GHCB requests.

In this implementation SEV-SNP, private guest memory is managed by a new
kernel framework called guest_memfd (gmem). With gmem, a new
KVM_SET_MEMORY_ATTRIBUTES KVM ioctl has been added to tell the KVM
MMU whether a particular GFN should be backed by shared (normal) memory or
private (gmem-allocated) memory. To tie into this, Page State Change
requests are forward to userspace via KVM_EXIT_VMGEXIT exits, which will
then issue the corresponding KVM_SET_MEMORY_ATTRIBUTES call to set the
private/shared state in the KVM MMU.

The gmem / KVM MMU hooks implemented in this series will then update the RMP
table entries for the backing PFNs to set them to guest-owned/private when
mapping private pages into the guest via KVM MMU, or use the normal KVM MMU
handling in the case of shared pages where the corresponding RMP table
entries are left in the default shared/hypervisor-owned state.

Feedback/review is very much appreciated!

-Mike


Changes since v9:

* Split off gmem changes to separate RFC series, drop RFC tag from this series
* Use 2M RMPUPDATE instructions whenever possible when invalidating/releasing
gmem pages
* Tighten up RMP #NPF handling to better differentiate spurious cases from
unexpected behavior
* Simplify/optimize logic for determine when 2M NPT private mappings are
possible
* Be more consistent with PFN data types and stub return values (Dave)
* Reduce potential flooding from frequently-printed pr_debug()'s (Dave)
* Use existing #PF handling paths to catch illegal userspace-generated RMP
faults (Dave)
* Improve host kexec/kdump support (Ashish)
* Reduce overhead from unecessary WBINVD via MMU notifiers (Ashish)
* Avoid host crashes during CCP module probe if SNP_INIT* is issued while
guests are running (Tom L.)
* Simplify AutoIBRS disablement (Kim, Dave)
* Avoid unecessary zero'ing in extended guest requests (Alexey)
* Fix padding in struct sev_user_data_ext_snp_config (Alexey)
* Report AP creation failures via GHCB error codes rather than inducing #GP in
guest (Peter)
* Disallow multiple allocations of snp_context via userspace (Peter)
* Error out on unsupported SNP policy bits (Tom)
* Fix snp_leak_pages() stub (Jeremi)
* Use C99 flexible arrays where appropriate
* Use helper to handle HVA->PFN conversions prior to dumping RMP entries (Dave)
* Don't potentially print out all 512 entries when dumping 2MB RMP range (Dave)
* Don't use a union to dump raw RMP entries, just cast at dump-site (Dave)
* Don't use helpers to access RMP entry bitfields, use them directly (Dave)
* Simplify logic and improve comments for AutoIBRS disablement (Dave)

# Changes that were split off to separate gmem series
* Use KVM_X86_SNP_VM to implement SNP-specific checks on whether a fault was
shared/private and drop the duplicate memslot lookup (Isaku, Sean)
* Use Isaku's version of patch to plumb 64-bit #NPF error code (Isaku)
* Fix up stub for kvm_arch_gmem_invalidate() (Boris)

Changes since v8:

* Rework gmem/UPM hooks based on Sean's latest gmem/UPM tree
* Move SEV lazy-pinning support out to a separate series which uses this
series as a prereq instead of the other way around.
* Re-organize extended guest request patches into 3 patches encompassing
SEV FD ioctls for host-wide certs, KVM ioctls for per-instance certs,
and the guest request handling that consumes them. Also move them to
the top of the series to better separate them for the core SNP patches
(Alexey, Zhi, Ashish, Dov, Dionna, others)
* Various other changes/fixups for extended guests request handling (Dov,
Alexey, Dionna)
* Use helper to calculate max RMP entry size and improve readability (Dave)
* Use architecture-independent GPA value for initial VMSA pages
* Ensure SEV_CMD_SNP_GUEST_REQUEST failures are indicated to guest (Alex)
* Allocate per-instance certs on-demand (Alex)
* comment fixup for RMP fault handling (Zhi)
* commit msg rewording for MSR-based PSCs (Zhi)
* update SNP command/struct definitions based on 1.54 ABI (Saban)
* use sev_deactivate_lock around SEV_CMD_SNP_DECOMMISSION (Saban)
* Various comment/commit fixups (Zhi, Alex, Kim, Vlastimil, Dave,
* kexec fixes for newer SNP firmwares (Ashish)
* Various other fixups and re-ordering of patches.

----------------------------------------------------------------
Ashish Kalra (4):
x86/sev: Introduce snp leaked pages list
KVM: SEV: Avoid WBINVD for HVA-based MMU notifications for SNP
iommu/amd: Add IOMMU_SNP_SHUTDOWN support
crypto: ccp: Add panic notifier for SEV/SNP firmware shutdown on kdump

Brijesh Singh (29):
x86/cpufeatures: Add SEV-SNP CPU feature
x86/sev: Add the host SEV-SNP initialization support
x86/sev: Add RMP entry lookup helpers
x86/fault: Add helper for dumping RMP entries
x86/traps: Define RMP violation #PF error code
x86/sev: Add helper functions for RMPUPDATE and PSMASH instruction
x86/sev: Invalidate pages from the direct map when adding them to the RMP table
crypto: ccp: Define the SEV-SNP commands
crypto: ccp: Add support to initialize the AMD-SP for SEV-SNP
crypto: ccp: Provide API to issue SEV and SNP commands
crypto: ccp: Handle the legacy TMR allocation when SNP is enabled
crypto: ccp: Handle the legacy SEV command when SNP is enabled
crypto: ccp: Add the SNP_PLATFORM_STATUS command
KVM: SEV: Add GHCB handling for Hypervisor Feature Support requests
KVM: SEV: Make AVIC backing, VMSA and VMCB memory allocation SNP safe
KVM: SEV: Add initial SEV-SNP support
KVM: SEV: Add KVM_SNP_INIT command
KVM: SEV: Add KVM_SEV_SNP_LAUNCH_START command
KVM: SEV: Add KVM_SEV_SNP_LAUNCH_UPDATE command
KVM: SEV: Add KVM_SEV_SNP_LAUNCH_FINISH command
KVM: SEV: Add support to handle GHCB GPA register VMGEXIT
KVM: SEV: Add support to handle MSR based Page State Change VMGEXIT
KVM: SEV: Add support to handle Page State Change VMGEXIT
KVM: x86: Export the kvm_zap_gfn_range() for the SNP use
KVM: SEV: Add support to handle RMP nested page faults
KVM: SVM: Add module parameter to enable the SEV-SNP
crypto: ccp: Add the SNP_{SET,GET}_EXT_CONFIG command
KVM: SEV: Provide support for SNP_GUEST_REQUEST NAE event
crypto: ccp: Add debug support for decrypting pages

Dionna Glaze (1):
x86/sev: Add KVM commands for per-instance certs

Kim Phillips (1):
x86/speculation: Do not enable Automatic IBRS if SEV SNP is enabled

Michael Roth (9):
KVM: SEV: Do not intercept accesses to MSR_IA32_XSS for SEV-ES guests
x86/fault: Report RMP page faults for kernel addresses
KVM: SEV: Select CONFIG_KVM_SW_PROTECTED_VM when CONFIG_KVM_AMD_SEV=y
KVM: SEV: Add KVM_EXIT_VMGEXIT
KVM: SEV: Add support for GHCB-based termination requests
KVM: SEV: Implement gmem hook for initializing private pages
KVM: SEV: Implement gmem hook for invalidating private pages
KVM: x86: Add gmem hook for determining max NPT mapping level
iommu/amd: Report all cases inhibiting SNP enablement

Paolo Bonzini (1):
KVM: SVM: INTERCEPT_RDTSCP is never intercepted anyway

Tom Lendacky (4):
KVM: SVM: Fix TSC_AUX virtualization setup
KVM: SEV: Add support to handle AP reset MSR protocol
KVM: SEV: Use a VMSA physical address variable for populating VMCB
KVM: SEV: Support SEV-SNP AP Creation NAE event

Vishal Annapurve (1):
KVM: Add HVA range operator

Documentation/virt/coco/sev-guest.rst | 54 +
Documentation/virt/kvm/api.rst | 34 +
.../virt/kvm/x86/amd-memory-encryption.rst | 147 ++
arch/x86/Kbuild | 2 +
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/include/asm/disabled-features.h | 8 +-
arch/x86/include/asm/kvm-x86-ops.h | 2 +
arch/x86/include/asm/kvm_host.h | 5 +
arch/x86/include/asm/msr-index.h | 11 +-
arch/x86/include/asm/sev-common.h | 33 +
arch/x86/include/asm/sev-host.h | 37 +
arch/x86/include/asm/sev.h | 6 +
arch/x86/include/asm/svm.h | 6 +
arch/x86/include/asm/trap_pf.h | 4 +
arch/x86/kernel/cpu/amd.c | 24 +-
arch/x86/kernel/cpu/common.c | 7 +-
arch/x86/kernel/crash.c | 7 +
arch/x86/kvm/Kconfig | 3 +
arch/x86/kvm/lapic.c | 5 +-
arch/x86/kvm/mmu.h | 2 -
arch/x86/kvm/mmu/mmu.c | 13 +-
arch/x86/kvm/svm/nested.c | 2 +-
arch/x86/kvm/svm/sev.c | 1903 +++++++++++++++++---
arch/x86/kvm/svm/svm.c | 64 +-
arch/x86/kvm/svm/svm.h | 41 +-
arch/x86/kvm/x86.c | 11 +
arch/x86/mm/fault.c | 5 +
arch/x86/virt/svm/Makefile | 3 +
arch/x86/virt/svm/sev.c | 548 ++++++
drivers/crypto/ccp/sev-dev.c | 1253 ++++++++++++-
drivers/crypto/ccp/sev-dev.h | 16 +
drivers/iommu/amd/init.c | 65 +-
include/linux/amd-iommu.h | 5 +-
include/linux/kvm_host.h | 6 +
include/linux/psp-sev.h | 304 +++-
include/uapi/linux/kvm.h | 74 +
include/uapi/linux/psp-sev.h | 71 +
tools/arch/x86/include/asm/cpufeatures.h | 1 +
virt/kvm/kvm_main.c | 49 +
39 files changed, 4497 insertions(+), 335 deletions(-)
create mode 100644 arch/x86/include/asm/sev-host.h
create mode 100644 arch/x86/virt/svm/Makefile
create mode 100644 arch/x86/virt/svm/sev.c