Re: [PATCH 1/3] kvm: wire up KVM_CAP_VM_GPA_BITS for x86

From: Xiaoyao Li
Date: Mon Mar 04 2024 - 22:00:01 EST


On 3/4/2024 11:15 PM, Sean Christopherson wrote:
On Fri, Mar 01, 2024, Gerd Hoffmann wrote:
Add new guest_phys_bits field to kvm_caps, return the value to
userspace when asked for KVM_CAP_VM_GPA_BITS capability.

Initialize guest_phys_bits with boot_cpu_data.x86_phys_bits.
Vendor modules (i.e. vmx and svm) can adjust this field in case
additional restrictions apply, for example in case EPT has no
support for 5-level paging.

Signed-off-by: Gerd Hoffmann <kraxel@xxxxxxxxxx>
---
arch/x86/kvm/x86.h | 2 ++
arch/x86/kvm/x86.c | 5 +++++
2 files changed, 7 insertions(+)

diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index 2f7e19166658..e03aec3527f8 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -24,6 +24,8 @@ struct kvm_caps {
bool has_bus_lock_exit;
/* notify VM exit supported? */
bool has_notify_vmexit;
+ /* usable guest phys bits */
+ u32 guest_phys_bits;
u64 supported_mce_cap;
u64 supported_xcr0;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 48a61d283406..e270b9b708d1 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4784,6 +4784,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
if (kvm_is_vm_type_supported(KVM_X86_SW_PROTECTED_VM))
r |= BIT(KVM_X86_SW_PROTECTED_VM);
break;
+ case KVM_CAP_VM_GPA_BITS:
+ r = kvm_caps.guest_phys_bits;

This is not a fast path, just compute the effective guest.MAXPHYADDR on the fly
using tdp_root_level and max_tdp_level. But as pointed out and discussed in the
previous thread, adverising a guest.MAXPHYADDR that is smaller than host.MAXPHYADDR
simply doesn't work[*].

I thought the plan was to add a way for KVM to advertise the maximum *addressable*
GPA, and figure out a way to communicate that to the guest, e.g. so that firmware
doesn't try to use legal GPAs that the host cannot address.

From one off-list email thread, Paolo was proposing to change the definition of CPUID.0x80000008:EAX[23:16] to "Maximum usable physical address size in bits", in detail:

Maximum usable physical address size in bits. Physical addresses
above this size should not be used, but will not produce a "reserved"
page fault. When this field is zero, all bits up to PhysAddrSize are
usable. This field is expected to be nonzero only on guests where
the hypervisor is using nesting paging.

As I understand it, it turns bit [23:16] of EAX of CPUID 0x80000008 into a PV field, that is set by VMM(e.g., KVM/QEMU) and consumed by guest.

So KVM can advertise maximum addressable/usable physical address bits in CPUID.0x80000008:EAX[23:16] via GET_SUPPORTED_CPUID.


Paolo, any update on this?

[*] https://lore.kernel.org/all/CALMp9eTutnTxCjQjs-nxP=XC345vTmJJODr+PcSOeaQpBW0Skw@xxxxxxxxxxxxxx