Re: [PATCH v10 48/50] KVM: SEV: Provide support for SNP_GUEST_REQUEST NAE event

From: Michael Roth
Date: Fri Nov 10 2023 - 17:08:32 EST


On Wed, Oct 18, 2023 at 06:48:59AM -0700, Sean Christopherson wrote:
> On Wed, Oct 18, 2023, Alexey Kardashevskiy wrote:
> >
> > On 18/10/23 03:27, Sean Christopherson wrote:
> > > On Mon, Oct 16, 2023, Dionna Amalie Glaze wrote:
> > > > > +
> > > > > + /*
> > > > > + * If a VMM-specific certificate blob hasn't been provided, grab the
> > > > > + * host-wide one.
> > > > > + */
> > > > > + snp_certs = sev_snp_certs_get(sev->snp_certs);
> > > > > + if (!snp_certs)
> > > > > + snp_certs = sev_snp_global_certs_get();
> > > > > +
> > > >
> > > > This is where the generation I suggested adding would get checked. If
> > > > the instance certs' generation is not the global generation, then I
> > > > think we need a way to return to the VMM to make that right before
> > > > continuing to provide outdated certificates.
> > > > This might be an unreasonable request, but the fact that the certs and
> > > > reported_tcb can be set while a VM is running makes this an issue.
> > >
> > > Before we get that far, the changelogs need to explain why the kernel is storing
> > > userspace blobs in the first place. The whole thing is a bit of a mess.
> > >
> > > sev_snp_global_certs_get() has data races that could lead to variations of TOCTOU
> > > bugs: sev_ioctl_snp_set_config() can overwrite psp_master->sev_data->snp_certs
> > > while sev_snp_global_certs_get() is running. If the compiler reloads snp_certs
> > > between bumping the refcount and grabbing the pointer, KVM will end up leaking a
> > > refcount and consuming a pointer without a refcount.
> > >
> > > if (!kref_get_unless_zero(&certs->kref))
> > > return NULL;
> > >
> > > return certs;
> >
> > I'm missing something here. The @certs pointer is on the stack,
>
> No, nothing guarantees that @certs is on the stack and will never be reloaded.
> sev_snp_certs_get() is in full view of sev_snp_global_certs_get(), so it's entirely
> possible that it can be inlined. Then you end up with:
>
> struct sev_device *sev;
>
> if (!psp_master || !psp_master->sev_data)
> return NULL;
>
> sev = psp_master->sev_data;
> if (!sev->snp_initialized)
> return NULL;
>
> if (!sev->snp_certs)
> return NULL;
>
> if (!kref_get_unless_zero(&sev->snp_certs->kref))
> return NULL;
>
> return sev->snp_certs;
>
> At which point the compiler could choose to omit a local variable entirely, it
> could store @certs in a register and reload after kref_get_unless_zero(), etc.
> If psp_master->sev_data->snp_certs is changed at any point, odd thing can happen.
>
> That atomic operation in kref_get_unless_zero() might prevent a reload between
> getting the kref and the return, but it wouldn't prevent a reload between the
> !NULL check and kref_get_unless_zero().
>
> > > If userspace wants to provide garbage to the guest, so be it, not KVM's problem.
> > > That way, whether the VM gets the global cert or a per-VM cert is purely a userspace
> > > concern.
> >
> > The global cert lives in CCP (/dev/sev), the per VM cert lives in kvmvm_fd.
> > "A la vcpu->run" is fine for the latter but for the former we need something
> > else.
>
> Why? The cert ultimately comes from userspace, no? Make userspace deal with it.
>
> > And there is scenario when one global certs blob is what is needed and
> > copying it over multiple VMs seems suboptimal.
>
> That's a solvable problem. I'm not sure I like the most obvious solution, but it
> is a solution: let userspace define a KVM-wide blob pointer, either via .mmap()
> or via an ioctl().
>
> FWIW, there's no need to do .mmap() shenanigans, e.g. an ioctl() to set the
> userspace pointer would suffice. The benefit of a kernel controlled pointer is
> that it doesn't require copying to a kernel buffer (or special code to copy from
> userspace into guest).
>
> Actually, looking at the flow again, AFAICT there's nothing special about the
> target DATA_PAGE. It must be SHARED *before* SVM_VMGEXIT_EXT_GUEST_REQUEST, i.e.
> KVM doesn't need to do conversions, there's no kernel priveleges required, etc.
> And the GHCB doesn't dictate ordering between storing the certificates and doing
> the request. That means the certificate stuff can be punted entirely to usersepace.
>
> Heh, typing up the below, there's another bug: KVM will incorrectly "return" '0'
> for non-SNP guests:

Thanks for the catch, will fix that up.

>
> unsigned long exitcode = 0;
> u64 data_gpa;
> int err, rc;
>
> if (!sev_snp_guest(vcpu->kvm)) {
> rc = SEV_RET_INVALID_GUEST; <= sets "rc", not "exitcode"
> goto e_fail;
> }
>
> e_fail:
> ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, exitcode);
>
> Which really highlights that we need to get test infrastructure up and running
> for SEV-ES, SNP, and TDX.
>
> Anyways, back to punting to userspace. Here's a rough sketch. The only new uAPI
> is the definition of KVM_HC_SNP_GET_CERTS and its arguments.

This sketch seems like a good, flexible way to handle per-VM certs, but
it does complicate things from a userspace perspective. As a basic
requirement, all userspaces will need to provide a way to specify the
initial blob (either a very verbose base64-encoded userspace cmdline param,
or a filepatch that needs additional management to store and handle
permissions/etc.), and also a means to update it (e.g. a HMP/QMP command
for QEMU, some libvirt wrappers, etc.).

That's all well and good if you want to make use of per-VM certs, but we
don't necessarily expect that most deployments will necessarily want to deal
with per-VM certs, and would be happy with a system-wide one where they could
simply issue the /dev/sev ioctl to inject one automatically for all guests.

So we're sort of complicating the more common case to support a more niche
one (as far as userspace is concerned anyway; as far as kernel goes, your
approach is certainly simplest :)).

Instead, maybe a compromise is warranted so the requirements on userspace
side are less complicated for a more basic deployment:

1) If /dev/sev is used to set a global certificate, then that will be
used unconditionally by KVM, protected by simple dumb mutex during
usage/update.
2) If /dev/sev is not used to set the global certificate is the value
is NULL, we assume userspace wants full responsibility for managing
certificates and exit to userspace to request the certs in the manner
you suggested.

Sean, Dionna, would this cover your concerns and address the certificate
update use-case?

-Mike