Re: [RFC PATCH 1/1] x86/mm: Mark CoCo VM pages invalid while moving between private and shared

From: Edgecombe, Rick P
Date: Wed Aug 30 2023 - 19:41:04 EST


On Thu, 2023-07-06 at 09:41 -0700, Michael Kelley wrote:
> To avoid these complexities of the CoCo exception handlers, change
> the core transition code in __set_memory_enc_pgtable() to do the
> following:
>
> 1.  Remove aliasing mappings
> 2.  Remove the PRESENT bit from the PTEs of all transitioning pages

This is a bit of an existing problem, but the failure cases of these
set_memory_en/decrypted() operations does not look to be in great
shape. It could fail halfway through if it needs to split the direct
map under memory pressure, in which case some of the callers will see
the error and free the unmapped pages to the direct map. (I was looking
at dma_direct_alloc()) Other's just leak the pages.

But the situation before the patch is not much better, since the direct
map change or enc_status_change_prepare/finish() could fail and leave
the pages in an inconsistent state, like this patch is trying to
address.

This lack of rollback on failure for CPA calls needs particular odd
handling in all the set_memory() callers. The way is to make a CPA call
to restore it to the previous permission, regardless of the error code
returned in the initial call that failed. The callers depend on any PTE
change successfully made having any needed splits already done for
those PTEs, so the restore can succeed at least as far as the failed
CPA call got.

In this COCO case apparently the enc_status_change_prepare/finish()
could fail too (and maybe not have the same forward progress
behavior?). So I'm not sure what you can do in that case.

I'm also not sure how bad it is to free encryption mismatched pages. Is
it the same as freeing unmapped pages? (likely oops or panic)

> 3.  Flush the TLB globally
> 4.  Flush the data cache if needed
> 5.  Set/clear the encryption attribute as appropriate
> 6.  Notify the hypervisor of the page status change
> 7.  Add back the PRESENT bit