Re: [PATCH v8 3/6] KVM: x86: Virtualize CR3.LAM_{U48,U57}

From: Huang, Kai
Date: Wed May 10 2023 - 08:00:06 EST


On Wed, 2023-05-10 at 14:06 +0800, Binbin Wu wrote:
> From: Robert Hoo <robert.hu@xxxxxxxxxxxxxxx>
>
> Add support to allow guests to set two new CR3 non-address control bits for
> guests to enable the new Intel CPU feature Linear Address Masking (LAM) on user
> pointers.
>
> LAM modifies the checking that is applied to 64-bit linear addresses, allowing
> software to use of the untranslated address bits for metadata and masks the
> metadata bits before using them as linear addresses to access memory. LAM uses
> two new CR3 non-address bits LAM_U48 (bit 62) and AM_U57 (bit 61) to configure
> LAM for user pointers. LAM also changes VMENTER to allow both bits to be set in
> VMCS's HOST_CR3 and GUEST_CR3 for virtualization.
>
> When EPT is on, CR3 is not trapped by KVM and it's up to the guest to set any of
> the two LAM control bits. However, when EPT is off, the actual CR3 used by the
> guest is generated from the shadow MMU root which is different from the CR3 that
> is *set* by the guest, and KVM needs to manually apply any active control bits
> to VMCS's GUEST_CR3 based on the cached CR3 *seen* by the guest.
>
> KVM manually checks guest's CR3 to make sure it points to a valid guest physical
> address (i.e. to support smaller MAXPHYSADDR in the guest). Extend this check
> to allow the two LAM control bits to be set. And to make such check generic,
> introduce a new field 'cr3_ctrl_bits' to vcpu to record all feature control bits
> that are allowed to be set by the guest. After check, non-address bits of guest
> CR3 will be stripped off to extract guest physical address.
>
> In case of nested, for a guest which supports LAM, both VMCS12's HOST_CR3 and
> GUEST_CR3 are allowed to have the new LAM control bits set, i.e. when L0 enters
> L1 to emulate a VMEXIT from L2 to L1 or when L0 enters L2 directly. KVM also
> manually checks VMCS12's HOST_CR3 and GUEST_CR3 being valid physical address.
> Extend such check to allow the new LAM control bits too.
>
> Note, LAM doesn't have a global control bit to turn on/off LAM completely, but
> purely depends on hardware's CPUID to determine it can be enabled or not. That
> means, when EPT is on, even when KVM doesn't expose LAM to guest, the guest can
> still set LAM control bits in CR3 w/o causing problem. This is an unfortunate
> virtualization hole. KVM could choose to intercept CR3 in this case and inject
> fault but this would hurt performance when running a normal VM w/o LAM support.
> This is undesirable. Just choose to let the guest do such illegal thing as the
> worst case is guest being killed when KVM eventually find out such illegal
> behaviour and that is the guest to blame.
>
> Opportunistically use GENMASK_ULL() to define __PT_BASE_ADDR_MASK.
> Opportunistically use kvm_vcpu_is_legal_cr3() to check CR3 in SVM nested code,
> to provide a clear distinction b/t CR3 and GPA checks.
>
> Suggested-by: Sean Christopherson <seanjc@xxxxxxxxxx>
> Signed-off-by: Robert Hoo <robert.hu@xxxxxxxxxxxxxxx>
> Co-developed-by: Binbin Wu <binbin.wu@xxxxxxxxxxxxxxx>
> Signed-off-by: Binbin Wu <binbin.wu@xxxxxxxxxxxxxxx>
> Tested-by: Xuelian Guo <xuelian.guo@xxxxxxxxx>
>

[...]

> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -3812,7 +3812,13 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
> hpa_t root;
>
> root_pgd = kvm_mmu_get_guest_pgd(vcpu, mmu);
> - root_gfn = root_pgd >> PAGE_SHIFT;
> + /*
> + * Guest PGD can be CR3 or EPTP (for nested EPT case). CR3 may contain
> + * additional control bits (e.g. LAM control bits). To be generic,
> + * unconditionally strip non-address bits when computing the GFN since
> + * the guest PGD has already been checked for validity.
> + */
> + root_gfn = (root_pgd & __PT_BASE_ADDR_MASK) >> PAGE_SHIFT;

Looks it's better to mask off non-address bits based on guest's invalid physical
address bits, for instance, based on vcpu->arch.resereved_gpa_bits. But looking
at the old discussion it appears Sean suggested this way:

https://lore.kernel.org/all/ZAuPPv8PUDX2RBQa@xxxxxxxxxx/

So anyway:

Reviewed-by: Kai Huang <kai.huang@xxxxxxxxx>