Re: [PATCH v11 18/20] x86: Handle TDX erratum to reset TDX private memory during kexec() and reboot

From: Huang, Kai
Date: Tue Jun 13 2023 - 20:16:10 EST


On Tue, 2023-06-13 at 14:05 +0300, kirill.shutemov@xxxxxxxxxxxxxxx wrote:
> On Tue, Jun 13, 2023 at 12:51:23AM +0000, Huang, Kai wrote:
> > On Mon, 2023-06-12 at 06:47 -0700, Dave Hansen wrote:
> > > On 6/12/23 03:27, Huang, Kai wrote:
> > > > So I think a __mb() after setting tdmr->pamt_4k_base should be good enough, as
> > > > it guarantees when setting to any pamt_*_size happens, the valid pamt_4k_base
> > > > will be seen by other cpus.
> > > >
> > > > Does it make sense?
> > >
> > > Just use a normal old atomic_t or set_bit()/test_bit(). They have
> > > built-in memory barriers are are less likely to get botched.
> >
> > Thanks for the suggestion.
> >
> > Hi Dave, Kirill,
> >
> > I'd like to check with you that whether we should introduce a mechanism to track
> > TDX private pages for both this patch and the next.
> >
> > As you can see this patch only deals PAMT pages due to couple of reasons that
> > mnentioned in the changelog. The next MCE patch handles all TDX private pages,
> > but it uses SEAMCALL in the #MC handler. Using SEAMCALL has two cons: 1) it is
> > slow (probably doesn't matter, though); 2) it brings additional risk of
> > triggering further #MC inside TDX module, although such risk should be a
> > theoretical thing.
> >
> > If we introduce a helper to mark a page as TDX private page, then both above
> > patches can utilize it. We don't need to consult TDMRs to get PAMT anymore in
> > this patch (we will need a way to loop all TDX-usable memory pages, but this
> > needs to be done anyway with TDX guests). I believe eventually we can end up
> > with less code.
> >
> > In terms of how to do, for PAMT pages, we can set page->private to a TDX magic
> > number because they come out of page allocator directly. Secure-EPT pages are
> > like PAMT pages too. For TDX guest private pages, Sean is moving to implement
> > KVM's own pseudo filesystem so they will have a unique mapping to identify.
> >
> > https://github.com/sean-jc/linux/commit/40d338c8629287dda60a9f7c800ede8549295a7c
> >
> > And my thinking is in this TDX host series, we can just handle PAMT pages. Both
> > secure-EPT and TDX guest private pages can be handled later in KVM TDX series.
> > I think eventually we can have a function like below to tell whether a page is
> > TDX private page:
> >
> > bool page_is_tdx_private(struct page *page)
> > {
> > if (page->private == TDX_PRIVATE_MAGIC)
> > return true;
> >
> > if (!page_mapping(page))
> > return false;
> >
> > return page_mapping(page)->a_ops == &kvm_gmem_ops;
> > }
> >
> > How does this sound? Or any other comments? Thanks!
>
> If you going to take this path it has to be supported natively by
> kvm_gmem_: it has to provide API for that. 
>

Yes.

> You should not assume that
> page->private is free to use. It is owned by kvm_gmmem.
>

page->private is only for PAMT and SEPT pages. kmem_gmem has it's own mapping
which can be used to identify the pages owned by it.

Hmm.. I think we should just leave them out for now, as they theoretically are
owned by KVM thus can be handled by KVM, e.g., in it's reboot or shutdown or
module unload code path.

If we are fine to use SEAMCALL in the #MC handler code path, I think perhaps we
can just keep using TDMRs to locate PAMTs.