Re: [PATCH v10 48/50] KVM: SEV: Provide support for SNP_GUEST_REQUEST NAE event

From: Alexey Kardashevskiy
Date: Wed Oct 18 2023 - 22:49:12 EST



On 19/10/23 00:48, Sean Christopherson wrote:
On Wed, Oct 18, 2023, Alexey Kardashevskiy wrote:

On 18/10/23 03:27, Sean Christopherson wrote:
On Mon, Oct 16, 2023, Dionna Amalie Glaze wrote:
+
+ /*
+ * If a VMM-specific certificate blob hasn't been provided, grab the
+ * host-wide one.
+ */
+ snp_certs = sev_snp_certs_get(sev->snp_certs);
+ if (!snp_certs)
+ snp_certs = sev_snp_global_certs_get();
+

This is where the generation I suggested adding would get checked. If
the instance certs' generation is not the global generation, then I
think we need a way to return to the VMM to make that right before
continuing to provide outdated certificates.
This might be an unreasonable request, but the fact that the certs and
reported_tcb can be set while a VM is running makes this an issue.

Before we get that far, the changelogs need to explain why the kernel is storing
userspace blobs in the first place. The whole thing is a bit of a mess.

sev_snp_global_certs_get() has data races that could lead to variations of TOCTOU
bugs: sev_ioctl_snp_set_config() can overwrite psp_master->sev_data->snp_certs
while sev_snp_global_certs_get() is running. If the compiler reloads snp_certs
between bumping the refcount and grabbing the pointer, KVM will end up leaking a
refcount and consuming a pointer without a refcount.

if (!kref_get_unless_zero(&certs->kref))
return NULL;

return certs;

I'm missing something here. The @certs pointer is on the stack,

No, nothing guarantees that @certs is on the stack and will never be reloaded.
sev_snp_certs_get() is in full view of sev_snp_global_certs_get(), so it's entirely
possible that it can be inlined. Then you end up with:

struct sev_device *sev;

if (!psp_master || !psp_master->sev_data)
return NULL;

sev = psp_master->sev_data;
if (!sev->snp_initialized)
return NULL;

if (!sev->snp_certs)
return NULL;

if (!kref_get_unless_zero(&sev->snp_certs->kref))
return NULL;

return sev->snp_certs;

At which point the compiler could choose to omit a local variable entirely, it
could store @certs in a register and reload after kref_get_unless_zero(), etc.
If psp_master->sev_data->snp_certs is changed at any point, odd thing can happen.

That atomic operation in kref_get_unless_zero() might prevent a reload between
getting the kref and the return, but it wouldn't prevent a reload between the
!NULL check and kref_get_unless_zero().

Oh. The function is exported so I thought gcc would not go that far but yeah it is possible. So this needs an explicit READ_ONCE barrier.


If userspace wants to provide garbage to the guest, so be it, not KVM's problem.
That way, whether the VM gets the global cert or a per-VM cert is purely a userspace
concern.

The global cert lives in CCP (/dev/sev), the per VM cert lives in kvmvm_fd.
"A la vcpu->run" is fine for the latter but for the former we need something
else.

Why? The cert ultimately comes from userspace, no? Make userspace deal with it.

And there is scenario when one global certs blob is what is needed and
copying it over multiple VMs seems suboptimal.

That's a solvable problem. I'm not sure I like the most obvious solution, but it
is a solution: let userspace define a KVM-wide blob pointer, either via .mmap()
or via an ioctl().

FWIW, there's no need to do .mmap() shenanigans, e.g. an ioctl() to set the
userspace pointer would suffice. The benefit of a kernel controlled pointer is
that it doesn't require copying to a kernel buffer (or special code to copy from
userspace into guest).

Just to clarify - like, a small userspace non-qemu program which just holds a pointer with the certs blob, or embed it into libvirt or systemd?


Actually, looking at the flow again, AFAICT there's nothing special about the
target DATA_PAGE. It must be SHARED *before* SVM_VMGEXIT_EXT_GUEST_REQUEST, i.e.
KVM doesn't need to do conversions, there's no kernel priveleges required, etc.
And the GHCB doesn't dictate ordering between storing the certificates and doing
the request. That means the certificate stuff can be punted entirely to usersepace.

All true.

Heh, typing up the below, there's another bug: KVM will incorrectly "return" '0'
for non-SNP guests:

unsigned long exitcode = 0;
u64 data_gpa;
int err, rc;

if (!sev_snp_guest(vcpu->kvm)) {
rc = SEV_RET_INVALID_GUEST; <= sets "rc", not "exitcode"
goto e_fail;
}

e_fail:
ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, exitcode);

Which really highlights that we need to get test infrastructure up and running
for SEV-ES, SNP, and TDX.

Anyways, back to punting to userspace. Here's a rough sketch. The only new uAPI
is the definition of KVM_HC_SNP_GET_CERTS and its arguments.

static void snp_handle_guest_request(struct vcpu_svm *svm)
{
struct vmcb_control_area *control = &svm->vmcb->control;
struct sev_data_snp_guest_request data = {0};
struct kvm_vcpu *vcpu = &svm->vcpu;
struct kvm *kvm = vcpu->kvm;
struct kvm_sev_info *sev;
gpa_t req_gpa = control->exit_info_1;
gpa_t resp_gpa = control->exit_info_2;
unsigned long rc;
int err;

if (!sev_snp_guest(vcpu->kvm)) {
rc = SEV_RET_INVALID_GUEST;
goto e_fail;
}

sev = &to_kvm_svm(kvm)->sev_info;

mutex_lock(&sev->guest_req_lock);

rc = snp_setup_guest_buf(svm, &data, req_gpa, resp_gpa);
if (rc)
goto unlock;

rc = sev_issue_cmd(kvm, SEV_CMD_SNP_GUEST_REQUEST, &data, &err);
if (rc)
/* Ensure an error value is returned to guest. */
rc = err ? err : SEV_RET_INVALID_ADDRESS;

snp_cleanup_guest_buf(&data, &rc);

unlock:
mutex_unlock(&sev->guest_req_lock);

e_fail:
ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, rc);
}

static int snp_complete_ext_guest_request(struct kvm_vcpu *vcpu)
{
u64 certs_exitcode = vcpu->run->hypercall.args[2];
struct vcpu_svm *svm = to_svm(vcpu);

if (certs_exitcode)
ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, certs_exitcode);
else
snp_handle_guest_request(svm);
return 1;
}

static int snp_handle_ext_guest_request(struct vcpu_svm *svm)
{
struct kvm_vcpu *vcpu = &svm->vcpu;
struct kvm *kvm = vcpu->kvm;
struct kvm_sev_info *sev;
unsigned long exitcode;
u64 data_gpa;

if (!sev_snp_guest(vcpu->kvm)) {
ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, SEV_RET_INVALID_GUEST);
return 1;
}

data_gpa = vcpu->arch.regs[VCPU_REGS_RAX];
if (!IS_ALIGNED(data_gpa, PAGE_SIZE)) {
ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, SEV_RET_INVALID_ADDRESS);
return 1;
}

vcpu->run->hypercall.nr = KVM_HC_SNP_GET_CERTS;
vcpu->run->hypercall.args[0] = data_gpa;
vcpu->run->hypercall.args[1] = vcpu->arch.regs[VCPU_REGS_RBX];
vcpu->run->hypercall.flags = KVM_EXIT_HYPERCALL_LONG_MODE;

btw why is it _LONG_MODE and not just _64? :)

vcpu->arch.complete_userspace_io = snp_complete_ext_guest_request;
return 0;
}

This should work the KVM stored certs nicely but not for the global certs. Although I am not all convinced that global certs is all that valuable but I do not know the history of that, happened before I joined so I let others to comment on that. Thanks,


--
Alexey