Re: [PATCH v10 48/50] KVM: SEV: Provide support for SNP_GUEST_REQUEST NAE event

From: Sean Christopherson
Date: Wed Oct 18 2023 - 09:49:31 EST


On Wed, Oct 18, 2023, Alexey Kardashevskiy wrote:
>
> On 18/10/23 03:27, Sean Christopherson wrote:
> > On Mon, Oct 16, 2023, Dionna Amalie Glaze wrote:
> > > > +
> > > > + /*
> > > > + * If a VMM-specific certificate blob hasn't been provided, grab the
> > > > + * host-wide one.
> > > > + */
> > > > + snp_certs = sev_snp_certs_get(sev->snp_certs);
> > > > + if (!snp_certs)
> > > > + snp_certs = sev_snp_global_certs_get();
> > > > +
> > >
> > > This is where the generation I suggested adding would get checked. If
> > > the instance certs' generation is not the global generation, then I
> > > think we need a way to return to the VMM to make that right before
> > > continuing to provide outdated certificates.
> > > This might be an unreasonable request, but the fact that the certs and
> > > reported_tcb can be set while a VM is running makes this an issue.
> >
> > Before we get that far, the changelogs need to explain why the kernel is storing
> > userspace blobs in the first place. The whole thing is a bit of a mess.
> >
> > sev_snp_global_certs_get() has data races that could lead to variations of TOCTOU
> > bugs: sev_ioctl_snp_set_config() can overwrite psp_master->sev_data->snp_certs
> > while sev_snp_global_certs_get() is running. If the compiler reloads snp_certs
> > between bumping the refcount and grabbing the pointer, KVM will end up leaking a
> > refcount and consuming a pointer without a refcount.
> >
> > if (!kref_get_unless_zero(&certs->kref))
> > return NULL;
> >
> > return certs;
>
> I'm missing something here. The @certs pointer is on the stack,

No, nothing guarantees that @certs is on the stack and will never be reloaded.
sev_snp_certs_get() is in full view of sev_snp_global_certs_get(), so it's entirely
possible that it can be inlined. Then you end up with:

struct sev_device *sev;

if (!psp_master || !psp_master->sev_data)
return NULL;

sev = psp_master->sev_data;
if (!sev->snp_initialized)
return NULL;

if (!sev->snp_certs)
return NULL;

if (!kref_get_unless_zero(&sev->snp_certs->kref))
return NULL;

return sev->snp_certs;

At which point the compiler could choose to omit a local variable entirely, it
could store @certs in a register and reload after kref_get_unless_zero(), etc.
If psp_master->sev_data->snp_certs is changed at any point, odd thing can happen.

That atomic operation in kref_get_unless_zero() might prevent a reload between
getting the kref and the return, but it wouldn't prevent a reload between the
!NULL check and kref_get_unless_zero().

> > If userspace wants to provide garbage to the guest, so be it, not KVM's problem.
> > That way, whether the VM gets the global cert or a per-VM cert is purely a userspace
> > concern.
>
> The global cert lives in CCP (/dev/sev), the per VM cert lives in kvmvm_fd.
> "A la vcpu->run" is fine for the latter but for the former we need something
> else.

Why? The cert ultimately comes from userspace, no? Make userspace deal with it.

> And there is scenario when one global certs blob is what is needed and
> copying it over multiple VMs seems suboptimal.

That's a solvable problem. I'm not sure I like the most obvious solution, but it
is a solution: let userspace define a KVM-wide blob pointer, either via .mmap()
or via an ioctl().

FWIW, there's no need to do .mmap() shenanigans, e.g. an ioctl() to set the
userspace pointer would suffice. The benefit of a kernel controlled pointer is
that it doesn't require copying to a kernel buffer (or special code to copy from
userspace into guest).

Actually, looking at the flow again, AFAICT there's nothing special about the
target DATA_PAGE. It must be SHARED *before* SVM_VMGEXIT_EXT_GUEST_REQUEST, i.e.
KVM doesn't need to do conversions, there's no kernel priveleges required, etc.
And the GHCB doesn't dictate ordering between storing the certificates and doing
the request. That means the certificate stuff can be punted entirely to usersepace.

Heh, typing up the below, there's another bug: KVM will incorrectly "return" '0'
for non-SNP guests:

unsigned long exitcode = 0;
u64 data_gpa;
int err, rc;

if (!sev_snp_guest(vcpu->kvm)) {
rc = SEV_RET_INVALID_GUEST; <= sets "rc", not "exitcode"
goto e_fail;
}

e_fail:
ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, exitcode);

Which really highlights that we need to get test infrastructure up and running
for SEV-ES, SNP, and TDX.

Anyways, back to punting to userspace. Here's a rough sketch. The only new uAPI
is the definition of KVM_HC_SNP_GET_CERTS and its arguments.

static void snp_handle_guest_request(struct vcpu_svm *svm)
{
struct vmcb_control_area *control = &svm->vmcb->control;
struct sev_data_snp_guest_request data = {0};
struct kvm_vcpu *vcpu = &svm->vcpu;
struct kvm *kvm = vcpu->kvm;
struct kvm_sev_info *sev;
gpa_t req_gpa = control->exit_info_1;
gpa_t resp_gpa = control->exit_info_2;
unsigned long rc;
int err;

if (!sev_snp_guest(vcpu->kvm)) {
rc = SEV_RET_INVALID_GUEST;
goto e_fail;
}

sev = &to_kvm_svm(kvm)->sev_info;

mutex_lock(&sev->guest_req_lock);

rc = snp_setup_guest_buf(svm, &data, req_gpa, resp_gpa);
if (rc)
goto unlock;

rc = sev_issue_cmd(kvm, SEV_CMD_SNP_GUEST_REQUEST, &data, &err);
if (rc)
/* Ensure an error value is returned to guest. */
rc = err ? err : SEV_RET_INVALID_ADDRESS;

snp_cleanup_guest_buf(&data, &rc);

unlock:
mutex_unlock(&sev->guest_req_lock);

e_fail:
ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, rc);
}

static int snp_complete_ext_guest_request(struct kvm_vcpu *vcpu)
{
u64 certs_exitcode = vcpu->run->hypercall.args[2];
struct vcpu_svm *svm = to_svm(vcpu);

if (certs_exitcode)
ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, certs_exitcode);
else
snp_handle_guest_request(svm);
return 1;
}

static int snp_handle_ext_guest_request(struct vcpu_svm *svm)
{
struct kvm_vcpu *vcpu = &svm->vcpu;
struct kvm *kvm = vcpu->kvm;
struct kvm_sev_info *sev;
unsigned long exitcode;
u64 data_gpa;

if (!sev_snp_guest(vcpu->kvm)) {
ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, SEV_RET_INVALID_GUEST);
return 1;
}

data_gpa = vcpu->arch.regs[VCPU_REGS_RAX];
if (!IS_ALIGNED(data_gpa, PAGE_SIZE)) {
ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, SEV_RET_INVALID_ADDRESS);
return 1;
}

vcpu->run->hypercall.nr = KVM_HC_SNP_GET_CERTS;
vcpu->run->hypercall.args[0] = data_gpa;
vcpu->run->hypercall.args[1] = vcpu->arch.regs[VCPU_REGS_RBX];
vcpu->run->hypercall.flags = KVM_EXIT_HYPERCALL_LONG_MODE;
vcpu->arch.complete_userspace_io = snp_complete_ext_guest_request;
return 0;
}