Re: [PATCH v2 07/11] KVM: x86: define standard behavior for bits 0/1 of VM type

From: Sean Christopherson
Date: Fri Feb 23 2024 - 11:46:51 EST


On Fri, Feb 23, 2024, Paolo Bonzini wrote:
> Some VM types have characteristics in common; in fact, the only use
> of VM types right now is kvm_arch_has_private_mem and it assumes that
> _all_ VM types have private memory.
>
> So, let the low bits specify the characteristics of the VM type. As
> of we have two special things: whether memory is private, and whether
> guest state is protected. The latter is similar to
> kvm->arch.guest_state_protected, but the latter is only set on a fully
> initialized VM. If both are set, ioctls to set registers will cause
> an error---SEV-ES did not do so, which is a problematic API.
>
> Signed-off-by: Paolo Bonzini <pbonzini@xxxxxxxxxx>
> Message-Id: <20240209183743.22030-7-pbonzini@xxxxxxxxxx>
> Signed-off-by: Paolo Bonzini <pbonzini@xxxxxxxxxx>
> ---
> arch/x86/include/asm/kvm_host.h | 9 +++-
> arch/x86/kvm/x86.c | 93 +++++++++++++++++++++++++++------
> 2 files changed, 85 insertions(+), 17 deletions(-)
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 0bcd9ae16097..15db2697863c 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -2135,8 +2135,15 @@ void kvm_mmu_new_pgd(struct kvm_vcpu *vcpu, gpa_t new_pgd);
> void kvm_configure_mmu(bool enable_tdp, int tdp_forced_root_level,
> int tdp_max_root_level, int tdp_huge_page_level);
>
> +
> +/* Low bits of VM types provide confidential computing capabilities. */
> +#define __KVM_X86_PRIVATE_MEM_TYPE 1
> +#define __KVM_X86_PROTECTED_STATE_TYPE 2
> +#define __KVM_X86_VM_TYPE_FEATURES 3
> +static_assert((KVM_X86_SW_PROTECTED_VM & __KVM_X86_VM_TYPE_FEATURES) == __KVM_X86_PRIVATE_MEM_TYPE);

Aliasing bit 0 to KVM_X86_SW_PROTECTED_VM is gross, e.g. this

#define KVM_X86_DEFAULT_VM 0
#define KVM_X86_SW_PROTECTED_VM 1
+#define KVM_X86_SEV_VM 8
+#define KVM_X86_SEV_ES_VM 10

is _super_ confusing and bound to cause problems.

Oh, good gravy, you're also aliasing __KVM_X86_PROTECTED_STATE_TYPE into SEV_ES_VM.
Curse you and your Rami Code induced decimal-based bitwise shenanigans!!!

I don't see any reason to bleed the flags into KVM's ABI. Even shoving the flags
into kvm->arch.vm_type is unncessary. Aha! As is storing vm_type as an "unsigned
long", since (a) it can't ever be bigger than u32, and (b) in practice a u8 will
suffice since as Mike pointed out we're effectively limited to 31 types before
kvm_vm_ioctl_check_extension() starts returning error codes.

So I vote to skip the aliasing and simply do:

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index ff23712f1f3f..27265ff5fd29 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1279,7 +1279,9 @@ enum kvm_apicv_inhibit {
};

struct kvm_arch {
- unsigned long vm_type;
+ u8 vm_type;
+ bool has_private_memory;
+ bool has_protected_state;
unsigned long n_used_mmu_pages;
unsigned long n_requested_mmu_pages;
unsigned long n_max_mmu_pages;

and then just use straight incrementing values for types, i.e.

#define KVM_X86_DEFAULT_VM 0
#define KVM_X86_SW_PROTECTED_VM 1
#define KVM_X86_SEV_VM 2
#define KVM_X86_SEV_ES_VM 3

It'll require a bit of extra work in kvm_arch_init_vm(), but I think the end result
will be signifincatly easier to follow.

> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 8746530930d5..e634e5b67516 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -5526,21 +5526,30 @@ static int kvm_vcpu_ioctl_x86_set_vcpu_events(struct kvm_vcpu *vcpu,
> return 0;
> }
>
> -static void kvm_vcpu_ioctl_x86_get_debugregs(struct kvm_vcpu *vcpu,
> - struct kvm_debugregs *dbgregs)
> +static int kvm_vcpu_ioctl_x86_get_debugregs(struct kvm_vcpu *vcpu,
> + struct kvm_debugregs *dbgregs)
> {
> unsigned long val;
>
> + if ((vcpu->kvm->arch.vm_type & __KVM_X86_PROTECTED_STATE_TYPE) &&
> + vcpu->arch.guest_state_protected)
> + return -EINVAL;
> +
> memset(dbgregs, 0, sizeof(*dbgregs));
> memcpy(dbgregs->db, vcpu->arch.db, sizeof(vcpu->arch.db));
> kvm_get_dr(vcpu, 6, &val);
> dbgregs->dr6 = val;
> dbgregs->dr7 = vcpu->arch.dr7;
> + return 0;
> }

> static int kvm_vcpu_ioctl_x86_set_xcrs(struct kvm_vcpu *vcpu,
> @@ -5622,6 +5645,10 @@ static int kvm_vcpu_ioctl_x86_set_xcrs(struct kvm_vcpu *vcpu,
> {
> int i, r = 0;
>
> + if ((vcpu->kvm->arch.vm_type & __KVM_X86_PROTECTED_STATE_TYPE) &&
> + vcpu->arch.guest_state_protected)
> + return -EINVAL;
> +
> if (!boot_cpu_has(X86_FEATURE_XSAVE))
> return -EINVAL;
>
> @@ -6010,7 +6037,9 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
> case KVM_GET_DEBUGREGS: {
> struct kvm_debugregs dbgregs;
>
> - kvm_vcpu_ioctl_x86_get_debugregs(vcpu, &dbgregs);
> + r = kvm_vcpu_ioctl_x86_get_debugregs(vcpu, &dbgregs);
> + if (r < 0)

I would strongly prefer to just do

if (r)

as "r < 0" implies that postive return values are possible/allowed.

That said, rather than a mix of open coding checks in kvm_arch_vcpu_ioctl() versus
burying checks in helpers, what about adding a dedicated helper to take care of
everything in one shot? E.g.

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 3bc69b0c9822..f5ca78e75eec 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5864,6 +5864,16 @@ static int kvm_vcpu_ioctl_enable_cap(struct kvm_vcpu *vcpu,
}
}

+static bool kvm_ioctl_accesses_guest_state(unsigned int ioctl)
+{
+ switch (ioctl) {
+ case <...>:
+ return true;
+ default:
+ return false:
+ }
+}
+
long kvm_arch_vcpu_ioctl(struct file *filp,
unsigned int ioctl, unsigned long arg)
{
@@ -5878,6 +5888,11 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
void *buffer;
} u;

+ if (vcpu->kvm->arch.has_protected_state &&
+ vcpu->arch.guest_state_protected &&
+ kvm_ioctl_accesses_guest_state(ioctl))
+ return -EINVAL;
+
vcpu_load(vcpu);

u.buffer = NULL;

It'll be a much smaller diff, and hopefully easier to audit, too.