Re: [PATCH v11 17/20] x86/kexec: Flush cache of TDX private memory

From: kirill . shutemov
Date: Fri Jun 09 2023 - 06:27:45 EST


On Mon, Jun 05, 2023 at 02:27:30AM +1200, Kai Huang wrote:
> There are two problems in terms of using kexec() to boot to a new kernel
> when the old kernel has enabled TDX: 1) Part of the memory pages are
> still TDX private pages; 2) There might be dirty cachelines associated
> with TDX private pages.
>
> The first problem doesn't matter on the platforms w/o the "partial write
> machine check" erratum. KeyID 0 doesn't have integrity check. If the
> new kernel wants to use any non-zero KeyID, it needs to convert the
> memory to that KeyID and such conversion would work from any KeyID.
>
> However the old kernel needs to guarantee there's no dirty cacheline
> left behind before booting to the new kernel to avoid silent corruption
> from later cacheline writeback (Intel hardware doesn't guarantee cache
> coherency across different KeyIDs).
>
> There are two things that the old kernel needs to do to achieve that:
>
> 1) Stop accessing TDX private memory mappings:
> a. Stop making TDX module SEAMCALLs (TDX global KeyID);
> b. Stop TDX guests from running (per-guest TDX KeyID).
> 2) Flush any cachelines from previous TDX private KeyID writes.
>
> For 2), use wbinvd() to flush cache in stop_this_cpu(), following SME
> support. And in this way 1) happens for free as there's no TDX activity
> between wbinvd() and the native_halt().
>
> Flushing cache in stop_this_cpu() only flushes cache on remote cpus. On
> the cpu which does kexec(), unlike SME which does the cache flush in
> relocate_kernel(), do the cache flush right after stopping remote cpus
> in machine_shutdown(). This is because on the platforms with above
> erratum, the kernel needs to convert all TDX private pages back to
> normal before a fast warm reset reboot or booting to the new kernel in
> kexec(). Flushing cache in relocate_kernel() only covers the kexec()
> but not the fast warm reset reboot.
>
> Theoretically, cache flush is only needed when the TDX module has been
> initialized. However initializing the TDX module is done on demand at
> runtime, and it takes a mutex to read the module status. Just check
> whether TDX is enabled by the BIOS instead to flush cache.
>
> Signed-off-by: Kai Huang <kai.huang@xxxxxxxxx>
> Reviewed-by: Isaku Yamahata <isaku.yamahata@xxxxxxxxx>

Reviewed-by: Kirill A. Shutemov <kirill.shutemov@xxxxxxxxxxxxxxx>

--
Kiryl Shutsemau / Kirill A. Shutemov