Re: [PATCH Part2 v5 00/45] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support

From: Sean Christopherson
Date: Fri Nov 12 2021 - 14:48:24 EST


On Fri, Nov 12, 2021, Borislav Petkov wrote:
> On Fri, Nov 12, 2021 at 09:59:46AM -0800, Dave Hansen wrote:
> > Or, is there some mechanism that prevent guest-private memory from being
> > accessed in random host kernel code?

Or random host userspace code...

> So I'm currently under the impression that random host->guest accesses
> should not happen if not previously agreed upon by both.

Key word "should".

> Because, as explained on IRC, if host touches a private guest page,
> whatever the host does to that page, the next time the guest runs, it'll
> get a #VC where it will see that that page doesn't belong to it anymore
> and then, out of paranoia, it will simply terminate to protect itself.
>
> So cloud providers should have an interest to prevent such random stray
> accesses if they wanna have guests. :)

Yes, but IMO inducing a fault in the guest because of _host_ bug is wrong.

On Fri, Nov 12, 2021, Peter Gonda wrote:
> Here is an alternative to the current approach: On RMP violation (host
> or userspace) the page fault handler converts the page from private to
> shared to allow the write to continue. This pulls from s390’s error
> handling which does exactly this. See ‘arch_make_page_accessible()’.

Ah, after further reading, s390 does _not_ do implicit private=>shared conversions.

s390's arch_make_page_accessible() is somewhat similar, but it is not a direct
comparison. IIUC, it exports and integrity protects the data and thus preserves
the guest's data in an encrypted form, e.g. so that it can be swapped to disk.
And if the host corrupts the data, attempting to convert it back to secure on a
subsequent guest access will fail.

The host kernel's handling of the "convert to secure" failures doesn't appear to
be all that robust, e.g. it looks like there are multiple paths where the error
is dropped on the floor and the guest is resumed , but IMO soft hanging the guest
is still better than inducing a fault in the guest, and far better than potentially
coercing the guest into reading corrupted memory ("spurious" PVALIDATE). And s390's
behavior is fixable since it's purely a host error handling problem.

To truly make a page shared, s390 requires the guest to call into the ultravisor
to make a page shared. And on the host side, the host can pin a page as shared
to prevent the guest from unsharing it while the host is accessing it as a shared
page.

So, inducing #VC is similar in the sense that a malicious s390 can also DoS itself,
but is quite different in that (AFAICT) s390 does not create an attack surface where
a malicious or buggy host userspace can induce faults in the guest, or worst case in
SNP, exploit a buggy guest into accepting and accessing corrupted data.

It's also different in that s390 doesn't implicitly convert between shared and
private. Functionally, it doesn't really change the end result because a buggy
host that writes guest private memory will DoS the guest (by inducing a #VC or
corrupting exported data), but at least for s390 there's a sane, legitimate use
case for accessing guest private memory (swap and maybe migration?), whereas for
SNP, IMO implicitly converting to shared on a host access is straight up wrong.

> Additionally it adds less complexity to the SNP kernel patches, and
> requires no new ABI.

I disagree, this would require "new" ABI in the sense that it commits KVM to
supporting SNP without requiring userspace to initiate any and all conversions
between shared and private. Which in my mind is the big elephant in the room:
do we want to require new KVM (and kernel?) ABI to allow/force userspace to
explicitly declare guest private memory for TDX _and_ SNP, or just TDX?