Re: [PATCH v15 17/23] x86/kexec: Flush cache of TDX private memory

From: Huang, Kai
Date: Mon Nov 27 2023 - 14:34:01 EST


On Mon, 2023-11-27 at 10:13 -0800, Hansen, Dave wrote:
> On 11/9/23 03:55, Kai Huang wrote:
> ...
> > --- a/arch/x86/kernel/reboot.c
> > +++ b/arch/x86/kernel/reboot.c
> > @@ -31,6 +31,7 @@
> > #include <asm/realmode.h>
> > #include <asm/x86_init.h>
> > #include <asm/efi.h>
> > +#include <asm/tdx.h>
> >
> > /*
> > * Power off function, if any
> > @@ -741,6 +742,20 @@ void native_machine_shutdown(void)
> > local_irq_disable();
> > stop_other_cpus();
> > #endif
> > + /*
> > + * stop_other_cpus() has flushed all dirty cachelines of TDX
> > + * private memory on remote cpus. Unlike SME, which does the
> > + * cache flush on _this_ cpu in the relocate_kernel(), flush
> > + * the cache for _this_ cpu here. This is because on the
> > + * platforms with "partial write machine check" erratum the
> > + * kernel needs to convert all TDX private pages back to normal
> > + * before booting to the new kernel in kexec(), and the cache
> > + * flush must be done before that. If the kernel took SME's way,
> > + * it would have to muck with the relocate_kernel() assembly to
> > + * do memory conversion.
> > + */
> > + if (platform_tdx_enabled())
> > + native_wbinvd();
>
> Why can't the TDX host code just set host_mem_enc_active=1?
>
> Sure, you'll end up *using* the SME WBINVD support, but then you don't
> have two different WBINVD call sites. You also don't have to mess with
> a single line of assembly.

I wanted to avoid changing the assembly.

Perhaps the comment isn't very clear. Flushing cache (on the CPU running kexec)
when the host_mem_enc_active=1 is handled in the relocate_kernel() assembly,
which happens at very late stage right before jumping to the new kernel.
However for TDX when the platform has erratum we need to convert TDX private
pages back to normal, which must be done after flushing cache. If we reuse
host_mem_enc_active=1, then we will need to change the assembly code to do that.