Re: [PATCH v13 17/22] x86/kexec: Flush cache of TDX private memory

From: Huang, Kai
Date: Mon Sep 18 2023 - 18:14:38 EST


On Mon, 2023-09-18 at 08:44 -0700, Dave Hansen wrote:
> On 9/18/23 05:08, Huang, Kai wrote:
> > On Fri, 2023-09-15 at 10:50 -0700, Dave Hansen wrote:
> > > On 9/15/23 10:43, Edgecombe, Rick P wrote:
> > > > On Sat, 2023-08-26 at 00:14 +1200, Kai Huang wrote:
> > > > > There are two problems in terms of using kexec() to boot to a new
> > > > > kernel when the old kernel has enabled TDX: 1) Part of the memory
> > > > > pages are still TDX private pages; 2) There might be dirty
> > > > > cachelines associated with TDX private pages.
> > > > Does TDX support hibernate?
> > > No.
> > >
> > > There's a whole bunch of volatile state that's generated inside the CPU
> > > and never leaves the CPU, like the ephemeral key that protects TDX
> > > module memory.
> > >
> > > SGX, for instance, never even supported suspend, IIRC. Enclaves just
> > > die and have to be rebuilt.
> >
> > Right. AFAICT TDX cannot survive from S3 either. All TDX keys get lost when
> > system enters S3. However I don't think TDX can be rebuilt after resume like
> > SGX. Let me confirm with TDX guys on this.
>
> By "rebuilt" I mean all private data is totally destroyed and rebuilt
> from scratch. The SGX architecture provides zero help other than
> delivering a fault and saying: "whoops all your data is gone".

Right. For TDX I am worrying about SEAMCALL could poison memory thus could
trigger #MC inside kernel, or even could trigger #MC inside SEAM, instead of
delivering a fault that SGX app/kernel can handle. I am confirming with TDX
team.

>
> > I think we can register syscore_ops->suspend for TDX, and refuse to suspend when
> > TDX is enabled. This covers hibernate case too.
> >
> > In terms of how to check "TDX is enabled", ideally it's better to check whether
> > TDX module is actually initialized, but the worst case is we can use
> > platform_tdx_enabled(). (I need to think more on this)
>
> *Ideally* the firmware would have a choke point where it could just tell
> the OS that it can't suspend rather than the OS having to figure it out.

Agreed. Let me ask TDX team about this too.