Re: [PATCHv2 2/3] x86/tdx: Fix race between set_memory_encrypted() and load_unaligned_zeropad()

From: Tom Lendacky
Date: Thu Jun 01 2023 - 14:19:42 EST


On 5/31/23 15:00, Michael Kelley (LINUX) wrote:
From: Sathyanarayanan Kuppuswamy <sathyanarayanan.kuppuswamy@xxxxxxxxxxxxxxx>
Sent: Tuesday, May 30, 2023 6:22 AM

Hi,

On 5/30/23 5:57 AM, Tom Lendacky wrote:
On 5/29/23 19:57, Kirill A. Shutemov wrote:
On Fri, May 26, 2023 at 03:10:56PM -0700, Sathyanarayanan Kuppuswamy wrote:


On 5/26/23 5:02 AM, Kirill A. Shutemov wrote:
Touching privately mapped GPA that is not properly converted to private
with MapGPA and accepted leads to unrecoverable exit to VMM.

load_unaligned_zeropad() can touch memory that is not owned by the
caller, but just happened to next after the owned memory.

/s/to/to be ?

Yep, my bad.

This load_unaligned_zeropad() behaviour makes it important when kernel
asks VMM to convert a GPA from shared to private or back. Kernel must
never have a page mapped into direct mapping (and aliases) as private
when the GPA is already converted to shared or when GPA is not yet
converted to private.

I am wondering whether this issue exist in the AMD code?

IMO, you can add some info on the window in set_memory_encrypted()
where this race exists.

I don't think AMD affected by load_unaligned_zeropad() the same way as
Intel does. But I'm not sure.

Tom, do you have any comments?

Right, shouldn't be an issue for SNP.

Thanks for confirming.


Tom -- For my education, could you elaborate on why this problem can't
occur in an SEV-SNP guest? There's still a window where the direct map
PTE and the RMP as maintained by the hypervisor are out-of-sync. If
load_unaligned_zeropad() does a read using the direct map PTE during
this out-of-sync window, isn't that going to trap to the hypervisor? How
is the scenario is handled from there to provide the zeros to
load_unaligned_zeropad()? I need to make sure Hyper-V is doing whatever
is needed. :-)

Ah, I think I misunderstood this when it was being talked about. The issue SNP would have would be between setting the c-bit but before the PVALIDATE is issued. Prior to the RMP being updated, referencing the page will generate an #NPF and automatically change the RMP over to private (in KVM). However, after the guest is resumed, the page will not have been validated resulting in a #VC with error code 0x404 being generated, causing the guest to terminate itself.

I suppose, when a 0x404 error code is encountered by the #VC handler, it could call search_exception_tables() and call ex_handler_zeropad() for the EX_TYPE_ZEROPAD type (ex_handler_zeropad is currently static, though).

Thanks,
Tom


Thanks,

Michael