Re: [PATCH] x86/sev: Add SEV-SNP guest feature negotiation support

From: Nikunj A. Dadhania
Date: Wed Nov 23 2022 - 10:45:35 EST


Hi Boris,

I will work on sending a v2 patch that addresses all your questions/concerns.

On 21/11/22 21:44, Borislav Petkov wrote:
> On Fri, Nov 18, 2022 at 06:58:07PM +0530, Nikunj A. Dadhania wrote:
>> No: For feature flags that need an enlightened guest, older guests
>> should detect and fail booting on any hypervisor that sets this
>> feature flag.
>
> What would happen with such old guests nowadays? Wouldn't they explode
> anyway?

Yes, and the patch is to prevent that and fail gracefully and so I had
CC'ed stable.

> And how is this supposed to work in practice?
>
> I'm guessing this is supposed to address a case where guest owner goes
> to a cloud provider, boots an old guest and it magically survives
> booting and then it runs with some false sense of security.
>
> But isn't that part of the whole attestation dance where the guest owner
> checks for a minimum set of features it expects to be present?
>
> Because if you do this and the cloud provider updates the hypervisor,
> all the guest owner images might stop working all of a sudden because of
> this check.
>
> So what is the actual, real-life example where this helps? At all?

A cloud provider having the hypervisor and qemu with Secure TSC support
tries to boot an old guest kernel (v5.19):

"-object sev-snp-guest,...,secure-tsc=on -kernel vmlinuz-5.19.0"

As per the current behavior, the guest kernel will hang.

This patch will make sure that if the hypervisor enables this feature, the
guest kernel will know early that an unsupported SNP feature is enabled. The guest
then sends a terminate message to the hypervisor, indicating that its not able
to fulfill hypervisor request to enable SecureTSC and cannot move ahead.

>> The hypervisor can enable various new features flags(in
>> SEV_FEATURES[1:63]) and start the guest. The hypervisor does not know
>> beforehand that the guest kernel supports the feature that is being
>> enabled.
>
> This is not the right criterion: it should be more like: can a guest
> still continue booting with a new feature it doesn't know about and
> still provide the same security.

That is indeed the criterion. Hence, instead of allowing the guest to
continue and have it fail randomly later(which would be a debugging nightmare),
this is to detect the unsupported feature early and fail gracefully.

> But see above - you need to think very practically here before even
> considering such a thing.
>
>> While booting, SNP guests can discover the hypervisor-enabled features
>> by looking at SEV_STATUS MSR. At this point, the SNP guest needs to
>> decide either to continue boot or terminate depending on the feature
>> support in the guest kernel. Otherwise, if we let the guest continue
>> booting with an unsupported feature, the guest can fail in non-obvious
>> manner.
>
> How can an old guest decide when it doesn't even have the intelligence
> to do so?
>
> What you're doing is, have old guests immediately stop booting when they
> encounter a new feature - even if that new feature doesn't impair their
> security.

A slight correction:
a) If the new feature was enabled by the HV/QEMU.
AND
b) The new feature requires guest enablement

" -object sev-snp-guest,...,secure-tsc=on -kernel vmlinuz-5.19.0 "

And it is not done wholesale.

>
>> +-------------------+----------+-------------+----------+
>> | HypervisorEnabled | Required | Available | Boot |
>> +-------------------+----------+-------------+----------+
>> | Y | Y | N | N (Fail) |
>
> This means that you need to know those features which would fail an old
> guest *upfront*. Like right now. And I hardly doubt that.
>
>> PREVENT_HOST_IBS will be defined but shouldn't be part of the
>> "Required" mask.
>
> So it doesn't need to be mentioned at all.

Yes, we could get rid of defines that do not need to be included in
required mask.

>> SECURE_TSC should be part of "Required" mask and once secure tsc
>> support is up-streamed it should be added to "Available" mask.
>
> So when a guest owner gets a new guest which supports SECURE_TSC and
> tries to boot it on an older HV - which is very much pretty everywhere
> the case - cloud providers won't update their HV kernel whenever - that
> new guest won't boot at all.

I think you misunderstood, if the HV does not have Secure TSC and new guest
has Secure TSC, still the guest should boot. As the hypervisor didnt request
the feature to be enabled (below row in the table)

+-------------------+----------+-------------+----------+
| HypervisorEnabled | Required | Available | Boot |
+-------------------+----------+-------------+----------+
| N | Y/N | Y/N | Y |
+-------------------+----------+-------------+----------+

>
> Which is a bad bad idea. I don't think you want that.
>
> What you want, rather, is to say in the SECURE_TSC enablement code
>
> pr_warn("HV doesn't support secure TSC - your guest won't be 100% secure\n");
>
> or so.
>
>> Guests need to check for features enabled by the hypervisor that is not
>> supported as well, so that we can terminate the guest right there.
>
> That needs to be done in the feature enablement code of each feature but
> not wholesale like this.
>
> Anyway, I think you get my point.

Regards
Nikunj