Re: [RFC PATCH 1/1] x86/mm: Mark CoCo VM pages invalid while moving between private and shared

From: Edgecombe, Rick P
Date: Thu Aug 31 2023 - 13:29:58 EST


On Wed, 2023-08-30 at 16:40 -0700, Rick Edgecombe wrote:
> This is a bit of an existing problem, but the failure cases of these
> set_memory_en/decrypted() operations does not look to be in great
> shape. It could fail halfway through if it needs to split the direct
> map under memory pressure, in which case some of the callers will see
> the error and free the unmapped pages to the direct map. (I was
> looking
> at dma_direct_alloc()) Other's just leak the pages.
>
> But the situation before the patch is not much better, since the
> direct
> map change or enc_status_change_prepare/finish() could fail and leave
> the pages in an inconsistent state, like this patch is trying to
> address.
>
> This lack of rollback on failure for CPA calls needs particular odd
> handling in all the set_memory() callers. The way is to make a CPA
> call
> to restore it to the previous permission, regardless of the error
> code
> returned in the initial call that failed. The callers depend on any
> PTE
> change successfully made having any needed splits already done for
> those PTEs, so the restore can succeed at least as far as the failed
> CPA call got.

Wait, since this does set_memory_np() as the first step for both
set_memory_encrypted() and set_memory_decrypted(), that pattern in the
callers wouldn't work. I wonder if it should try to rollback itself if
set_memory_np() fails (call set_memory_p() before returning the error).
At least that will handle failures that happen on the guest side.

>
> In this COCO case apparently the enc_status_change_prepare/finish()
> could fail too (and maybe not have the same forward progress
> behavior?). So I'm not sure what you can do in that case.
>
> I'm also not sure how bad it is to free encryption mismatched pages.
> Is
> it the same as freeing unmapped pages? (likely oops or panic)