Re: [PATCH Part2 RFC v4 38/40] KVM: SVM: Provide support for SNP_GUEST_REQUEST NAE event

From: Sean Christopherson
Date: Mon Jul 19 2021 - 20:23:53 EST


On Wed, Jul 07, 2021, Brijesh Singh wrote:
> Version 2 of GHCB specification added the support two SNP Guest Request
> Message NAE event. The events allows for an SEV-SNP guest to make request
> to the SEV-SNP firmware through hypervisor using the SNP_GUEST_REQUEST
> API define in the SEV-SNP firmware specification.

IIUC, this snippet in the spec means KVM can't restrict what requests are made
by the guests. If so, that makes it difficult to detect/ratelimit a misbehaving
guest, and also limits our options if there are firmware issues (hopefully there
aren't). E.g. ratelimiting a guest after KVM has explicitly requested it to
migrate is not exactly desirable.

The hypervisor cannot alter the messages without detection nor read the
plaintext of the messages.

> The SNP_GUEST_REQUEST requires two unique pages, one page for the request
> and one page for the response. The response page need to be in the firmware
> state. The GHCB specification says that both the pages need to be in the
> hypervisor state but before executing the SEV-SNP command the response page
> need to be in the firmware state.

...

> Now that KVM supports all the VMGEXIT NAEs required for the base SEV-SNP
> feature, set the hypervisor feature to advertise it.

It would helpful if this changelog listed the Guest Requests that are required
for "base" SNP, e.g. to provide some insight as to why we care about guest
requests.

> static int snp_bind_asid(struct kvm *kvm, int *error)
> @@ -1618,6 +1631,12 @@ static int snp_launch_start(struct kvm *kvm, struct kvm_sev_cmd *argp)
> if (rc)
> goto e_free_context;
>
> + /* Used for rate limiting SNP guest message request, use the default settings */
> + ratelimit_default_init(&sev->snp_guest_msg_rs);

Is this exposed to userspace in any way? This feels very much like a knob that
needs to be configurable per-VM.

Also, what are the estimated latencies of a guest request? If the worst case
latency is >200ms, a default ratelimit frequency of 5hz isn't going to do a whole
lot.

> +static void snp_handle_guest_request(struct vcpu_svm *svm, struct ghcb *ghcb,
> + gpa_t req_gpa, gpa_t resp_gpa)
> +{
> + struct sev_data_snp_guest_request data = {};
> + struct kvm_vcpu *vcpu = &svm->vcpu;
> + struct kvm *kvm = vcpu->kvm;
> + struct kvm_sev_info *sev;
> + int rc, err = 0;
> +
> + if (!sev_snp_guest(vcpu->kvm)) {
> + rc = -ENODEV;
> + goto e_fail;
> + }
> +
> + sev = &to_kvm_svm(kvm)->sev_info;
> +
> + if (!__ratelimit(&sev->snp_guest_msg_rs)) {
> + pr_info_ratelimited("svm: too many guest message requests\n");
> + rc = -EAGAIN;

What guarantee do we have that the guest actually understands -EAGAIN? Ditto
for -EINVAL returned by snp_build_guest_buf(). AFAICT, our options are to return
one of the error codes defined in "Table 95. Status Codes for SNP_GUEST_REQUEST"
of the firmware ABI, kill the guest, or ratelimit the guest without returning
control to the guest.

> + goto e_fail;
> + }
> +
> + rc = snp_build_guest_buf(svm, &data, req_gpa, resp_gpa);
> + if (rc)
> + goto e_fail;
> +
> + sev = &to_kvm_svm(kvm)->sev_info;
> +
> + mutex_lock(&kvm->lock);

Question on the VMPCK sequences. The firmware ABI says:

Each guest has four VMPCKs ... Each message contains a sequence number per
VMPCK. The sequence number is incremented with each message sent. Messages
sent by the guest to the firmware and by the firmware to the guest must be
delivered in order. If not, the firmware will reject subsequent messages ...

Does that mean there are four independent sequences, i.e. four streams the guest
can use "concurrently", or does it mean the overall freshess/integrity check is
composed from four VMPCK sequences, all of which must be correct for the message
to be valid?

If it's the latter, then a traditional mutex isn't really necessary because the
guest must implement its own serialization, e.g. it's own mutex or whatever, to
ensure there is at most one request in-flight at any given time. And on the KVM
side it means KVM can simpy reject requests if there is already an in-flight
request. It might also give us more/better options for ratelimiting?

> + rc = sev_issue_cmd(kvm, SEV_CMD_SNP_GUEST_REQUEST, &data, &err);
> + if (rc) {
> + mutex_unlock(&kvm->lock);

I suspect you reused this pattern from other, more complex code, but here it's
overkill. E.g.

if (!rc)
rc = kvm_write_guest(kvm, resp_gpa, sev->snp_resp_page, PAGE_SIZE);
else if (err)
rc = err;

mutex_unlock(&kvm->lock);

ghcb_set_sw_exit_info_2(ghcb, rc);

> + /* If we have a firmware error code then use it. */
> + if (err)
> + rc = err;
> +
> + goto e_fail;
> + }
> +
> + /* Copy the response after the firmware returns success. */
> + rc = kvm_write_guest(kvm, resp_gpa, sev->snp_resp_page, PAGE_SIZE);
> +
> + mutex_unlock(&kvm->lock);
> +
> +e_fail:
> + ghcb_set_sw_exit_info_2(ghcb, rc);
> +}
> +
> +static void snp_handle_ext_guest_request(struct vcpu_svm *svm, struct ghcb *ghcb,
> + gpa_t req_gpa, gpa_t resp_gpa)
> +{
> + struct sev_data_snp_guest_request req = {};
> + struct kvm_vcpu *vcpu = &svm->vcpu;
> + struct kvm *kvm = vcpu->kvm;
> + unsigned long data_npages;
> + struct kvm_sev_info *sev;
> + unsigned long err;
> + u64 data_gpa;
> + int rc;
> +
> + if (!sev_snp_guest(vcpu->kvm)) {
> + rc = -ENODEV;
> + goto e_fail;
> + }
> +
> + sev = &to_kvm_svm(kvm)->sev_info;
> +
> + if (!__ratelimit(&sev->snp_guest_msg_rs)) {
> + pr_info_ratelimited("svm: too many guest message requests\n");
> + rc = -EAGAIN;
> + goto e_fail;
> + }
> +
> + if (!sev->snp_certs_data) {
> + pr_err("svm: certs data memory is not allocated\n");
> + rc = -EFAULT;

Another instance where the kernel's error numbers will not suffice.

> + goto e_fail;
> + }
> +
> + data_gpa = ghcb_get_rax(ghcb);
> + data_npages = ghcb_get_rbx(ghcb);