RE: [PATCH 00/10] Handle set_memory_XXcrypted() errors

From: Michael Kelley (LINUX)
Date: Mon Oct 23 2023 - 12:47:43 EST


From: Dave Hansen <dave.hansen@xxxxxxxxx> Sent: Thursday, October 19, 2023 12:13 PM
>
> On 10/19/23 10:05, Michael Kelley (LINUX) wrote:
> > I'm more in favor of the "simply panic" approach. What you've done
> > in your Patch 1 and Patch 2 is an intriguing way to try to get the memory
> > back into a consistent state. But I'm concerned that there are failure
> > modes that make it less than 100% foolproof (more on that below). If
> > we can't be sure that the memory is back in a consistent state, then the
> > original problem isn't fully solved. I'm also not sure of the value of
> > investing effort to ensure that some errors cases are handled without
> > panic'ing. The upside benefit of not panic'ing seems small compared to
> > the downside risk of leaking guest VM data to the host.
>
> panic() should be a last resort. We *always* continue unless we know
> that something is so bad that we're going to make things worse by
> continuing to run.
>
> We shouldn't panic() on the first little thing that goes wrong. If
> folks want *that*, then they can set panic_on_warn.
>
> > My concern about Patches 1 and 2 is that the encryption bit in the PTE
> > is not a reliable indicator of the state that the host thinks the page is
> > in. Changing the state requires two steps (in either order): 1) updating
> > the guest VM PTEs, and 2) updating the host's view of the page state.
> > Both steps may be done on a range of pages. If #2 fails, the guest
> > doesn't know which pages in the batch were updated and which were
> > not, so the guest PTEs may not match the host state. In such a case,
> > set_memory_encrypted() could succeed based on checking the
> > PTEs when in fact the host still thinks some of the pages are shared.
> > Such a mismatch will produce a guest panic later on if the page is
> > referenced.
>
> I think that's OK. In the end, the page state is controlled by the VMM.
> The guest has zero control. All it can do is make the PTEs consistent
> and hold on for dear life. That's a general statement and not specific
> to this problem.
>
> In other words, it's fine for CoCo folks to be paranoid. It's fine for
> them to set panic_on_{warn,oops,whatever}=1. But it's *NOT* fine to say
> that every TDX guest will want to do that.

The premise of this patch set is to not put pages on the Linux
guest free list that are shared. I agree with that premise. But
more precisely, the best we can do is not put pages on the free
list where the guest PTE indicates "shared". Even if the host is
not acting maliciously, errors can cause the guest and host to be
out-of-sync regarding a page's private/shared status. There's no
way to find out for sure if the host status is "private" before
returning such a page to the free list, though if
set_memory_encrypted() succeeds and the host is not
malicious, we should be reasonably safe.

For paranoid CoCo VM users, using panic_on_warn=1 seems
workable. However, with current code and this patch series,
it's possible have set_memory_decrypted() return an error and
have set_memory_encrypted() fix things up as best it can
without generating any warnings. It seems like we need a
WARN or some equivalent mechanism if either of these fails,
so that CoCo VMs can panic if they don't want to run with any
inconsistencies (again, assuming the host isn't malicious).

Also, from a troubleshooting standpoint, panic_on_warn=1
will make it easier to diagnose a failure of
set_memory_encrypted()/decrypted() if it is caught
immediately, versus putting a page with an inconsistent state
on the free list and having things blow up later.

Michael