RE: [PATCH v4 0/3] x86/hyperv: Mark CoCo VM pages not present when changing encrypted state

From: Michael Kelley
Date: Fri Feb 09 2024 - 10:51:30 EST

From: mhkelley58@xxxxxxxxx <mhkelley58@xxxxxxxxx> Sent: Monday, January 15, 2024 6:20 PM
> In a CoCo VM, when transitioning memory from encrypted to decrypted, or
> vice versa, the caller of set_memory_encrypted() or set_memory_decrypted()
> is responsible for ensuring the memory isn't in use and isn't referenced
> while the transition is in progress. The transition has multiple steps,
> and the memory is in an inconsistent state until all steps are complete.
> A reference while the state is inconsistent could result in an exception
> that can't be cleanly fixed up.
> However, the kernel load_unaligned_zeropad() mechanism could cause a stray
> reference that can't be prevented by the caller of set_memory_encrypted()
> or set_memory_decrypted(), so there's specific code to handle this case.
> But a CoCo VM running on Hyper-V may be configured to run with a paravisor,
> with the #VC or #VE exception routed to the paravisor. There's no
> architectural way to forward the exceptions back to the guest kernel, and
> in such a case, the load_unaligned_zeropad() specific code doesn't work.
> To avoid this problem, mark pages as "not present" while a transition
> is in progress. If load_unaligned_zeropad() causes a stray reference, a
> normal page fault is generated instead of #VC or #VE, and the
> page-fault-based fixup handlers for load_unaligned_zeropad() resolve the
> reference. When the encrypted/decrypted transition is complete, mark the
> pages as "present" again.
> This version of the patch series marks transitioning pages "not present"
> only when running as a Hyper-V guest with a paravisor. Previous
> versions[1] marked transitioning pages "not present" regardless of the
> hypervisor and regardless of whether a paravisor is in use. That more
> general use had the benefit of decoupling the load_unaligned_zeropad()
> fixup from CoCo VM #VE and #VC exception handling. But the implementation
> was problematic for SEV-SNP because the SEV-SNP hypervisor callbacks
> require a valid virtual address, not a physical address like with TDX and
> the Hyper-V paravisor. Marking the transitioning pages "not present"
> causes the virtual address to not be valid, and the PVALIDATE
> instruction in the SEV-SNP callback fails. Constructing a temporary
> virtual address for this purpose is slower and adds complexity that
> negates the benefits of the more general use. So this version narrows
> the applicability of the approach to just where it is required
> because of the #VC and #VE exceptions being routed to a paravisor.
> The previous version minimized the TLB flushing done during page
> transitions between encrypted and decrypted. Because this version
> marks the pages "not present" in hypervisor specific callbacks and
> not in __set_memory_enc_pgtable(), doing such optimization is more
> difficult to coordinate. But the page transitions are not a hot path,
> so this version eschews optimization of TLB flushing in favor of
> simplicity.
> Since this version no longer touches __set_memory_enc_pgtable(),
> I've also removed patches that add comments about error handling
> in that function. Rick Edgecombe has proposed patches to improve
> that error handling, and I'll leave those comments to Rick's
> patches.
> Patch 1 handles implications of the hypervisor callbacks needing
> to do virt-to-phys translations on pages that are temporarily
> marked not present.
> Patch 2 makes the existing set_memory_p() function available for
> use in the hypervisor callbacks.
> Patch 3 is the core change that marks the transitioning pages
> as not present.
> This patch set is based on the linux-next20240103 code tree.
> Changes in v4:
> * Patch 1: Updated comment in slow_virt_to_phys() to reduce the
> likelihood of the comment becoming stale. The new comment
> describes the requirement to work with leaf PTE not present,
> but doesn't directly reference the CoCo hypervisor callbacks.
> [Rick Edgecombe]
> * Patch 1: Decomposed a complex line-wrapped statement into
> multiple statements for ease of understanding. No functional
> change compared with v3. [Kirill Shutemov]
> * Patch 3: Fixed handling of memory allocation errors. [Rick
> Edgecombe]
> Changes in v3:
> * Major rework and simplification per discussion above.
> Changes in v2:
> * Added Patches 3 and 4 to deal with the failure on SEV-SNP
> [Tom Lendacky]
> * Split the main change into two separate patches (Patch 5 and
> Patch 6) to improve reviewability and to offer the option of
> retaining both hypervisor callbacks.
> * Patch 5 moves set_memory_p() out of an #ifdef CONFIG_X86_64
> so that the code builds correctly for 32-bit, even though it
> is never executed for 32-bit [reported by kernel test robot]
> [1]
> mhklinux@xxxxxxxxxxx/
> Michael Kelley (3):
> x86/hyperv: Use slow_virt_to_phys() in page transition hypervisor
> callback
> x86/mm: Regularize set_memory_p() parameters and make non-static
> x86/hyperv: Make encrypted/decrypted changes safe for
> load_unaligned_zeropad()
> arch/x86/hyperv/ivm.c | 65 ++++++++++++++++++++++++++++---
> arch/x86/include/asm/set_memory.h | 1 +
> arch/x86/mm/pat/set_memory.c | 24 +++++++-----
> 3 files changed, 75 insertions(+), 15 deletions(-)

Wei --

Can this series go through the Hyper-V tree? It's mostly Hyper-V specific
code, plus some comments and a minor tweak to a utility function in 'mm'.

All comments on earlier versions have been addressed, and it would be
good to get some mileage in linux-next before the 6.9 merge window.