Re: [PATCH 05/16] KVM: x86/mmu: Use synthetic page fault error code to indicate private faults

From: Xu Yilun
Date: Wed Mar 06 2024 - 04:48:00 EST


On Tue, Feb 27, 2024 at 06:41:36PM -0800, Sean Christopherson wrote:
> Add and use a synthetic, KVM-defined page fault error code to indicate
> whether a fault is to private vs. shared memory. TDX and SNP have
> different mechanisms for reporting private vs. shared, and KVM's
> software-protected VMs have no mechanism at all. Usurp an error code
> flag to avoid having to plumb another parameter to kvm_mmu_page_fault()
> and friends.
>
> Alternatively, KVM could borrow AMD's PFERR_GUEST_ENC_MASK, i.e. set it
> for TDX and software-protected VMs as appropriate, but that would require
> *clearing* the flag for SEV and SEV-ES VMs, which support encrypted
> memory at the hardware layer, but don't utilize private memory at the
> KVM layer.

I see this alternative in other patchset. And I still don't understand why
synthetic way is better after reading patch #5-7. I assume for SEV(-ES) if
there is spurious PFERR_GUEST_ENC flag set in error code when private memory
is not used in KVM, then it is a HW issue or SW bug that needs to be caught
and warned, and by the way cleared.

Thanks,
Yilun

>
> Opportunistically add a comment to call out that the logic for software-
> protected VMs is (and was before this commit) broken for nested MMUs, i.e.
> for nested TDP, as the GPA is an L2 GPA. Punt on trying to play nice with
> nested MMUs as there is a _lot_ of functionality that simply doesn't work
> for software-protected VMs, e.g. all of the paths where KVM accesses guest
> memory need to be updated to be aware of private vs. shared memory.
>
> Signed-off-by: Sean Christopherson <seanjc@xxxxxxxxxx>
> ---
> arch/x86/include/asm/kvm_host.h | 11 +++++++++++
> arch/x86/kvm/mmu/mmu.c | 26 +++++++++++++++++++-------
> arch/x86/kvm/mmu/mmu_internal.h | 2 +-
> 3 files changed, 31 insertions(+), 8 deletions(-)
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 1e69743ef0fb..4077c46c61ab 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -267,7 +267,18 @@ enum x86_intercept_stage;
> #define PFERR_GUEST_ENC_MASK BIT_ULL(34)
> #define PFERR_GUEST_SIZEM_MASK BIT_ULL(35)
> #define PFERR_GUEST_VMPL_MASK BIT_ULL(36)
> +
> +/*
> + * IMPLICIT_ACCESS is a KVM-defined flag used to correctly perform SMAP checks
> + * when emulating instructions that triggers implicit access.
> + */
> #define PFERR_IMPLICIT_ACCESS BIT_ULL(48)
> +/*
> + * PRIVATE_ACCESS is a KVM-defined flag us to indicate that a fault occurred
> + * when the guest was accessing private memory.
> + */
> +#define PFERR_PRIVATE_ACCESS BIT_ULL(49)
> +#define PFERR_SYNTHETIC_MASK (PFERR_IMPLICIT_ACCESS | PFERR_PRIVATE_ACCESS)
>
> #define PFERR_NESTED_GUEST_PAGE (PFERR_GUEST_PAGE_MASK | \
> PFERR_WRITE_MASK | \
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 408969ac1291..7807bdcd87e8 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -5839,19 +5839,31 @@ int noinline kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 err
> bool direct = vcpu->arch.mmu->root_role.direct;
>
> /*
> - * IMPLICIT_ACCESS is a KVM-defined flag used to correctly perform SMAP
> - * checks when emulating instructions that triggers implicit access.
> * WARN if hardware generates a fault with an error code that collides
> - * with the KVM-defined value. Clear the flag and continue on, i.e.
> - * don't terminate the VM, as KVM can't possibly be relying on a flag
> - * that KVM doesn't know about.
> + * with KVM-defined sythentic flags. Clear the flags and continue on,
> + * i.e. don't terminate the VM, as KVM can't possibly be relying on a
> + * flag that KVM doesn't know about.
> */
> - if (WARN_ON_ONCE(error_code & PFERR_IMPLICIT_ACCESS))
> - error_code &= ~PFERR_IMPLICIT_ACCESS;
> + if (WARN_ON_ONCE(error_code & PFERR_SYNTHETIC_MASK))
> + error_code &= ~PFERR_SYNTHETIC_MASK;
>
> if (WARN_ON_ONCE(!VALID_PAGE(vcpu->arch.mmu->root.hpa)))
> return RET_PF_RETRY;
>
> + /*
> + * Except for reserved faults (emulated MMIO is shared-only), set the
> + * private flag for software-protected VMs based on the gfn's current
> + * attributes, which are the source of truth for such VMs. Note, this
> + * wrong for nested MMUs as the GPA is an L2 GPA, but KVM doesn't
> + * currently supported nested virtualization (among many other things)
> + * for software-protected VMs.
> + */
> + if (IS_ENABLED(CONFIG_KVM_SW_PROTECTED_VM) &&
> + !(error_code & PFERR_RSVD_MASK) &&
> + vcpu->kvm->arch.vm_type == KVM_X86_SW_PROTECTED_VM &&
> + kvm_mem_is_private(vcpu->kvm, gpa_to_gfn(cr2_or_gpa)))
> + error_code |= PFERR_PRIVATE_ACCESS;
> +
> r = RET_PF_INVALID;
> if (unlikely(error_code & PFERR_RSVD_MASK)) {
> r = handle_mmio_page_fault(vcpu, cr2_or_gpa, direct);
> diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
> index 1fab1f2359b5..d7c10d338f14 100644
> --- a/arch/x86/kvm/mmu/mmu_internal.h
> +++ b/arch/x86/kvm/mmu/mmu_internal.h
> @@ -306,7 +306,7 @@ static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
> .max_level = KVM_MAX_HUGEPAGE_LEVEL,
> .req_level = PG_LEVEL_4K,
> .goal_level = PG_LEVEL_4K,
> - .is_private = kvm_mem_is_private(vcpu->kvm, cr2_or_gpa >> PAGE_SHIFT),
> + .is_private = err & PFERR_PRIVATE_ACCESS,
> };
> int r;
>
> --
> 2.44.0.278.ge034bb2e1d-goog
>
>