Re: [PATCH 1/4] x86/coco: Add a new CC attribute to unify cache flush during kexec

From: Huang, Kai
Date: Tue Feb 20 2024 - 15:07:31 EST


On Tue, 2024-02-20 at 08:47 -0600, Tom Lendacky wrote:
> On 2/20/24 08:28, Borislav Petkov wrote:
> > On Mon, Feb 19, 2024 at 04:09:47PM -0600, Tom Lendacky wrote:
> > > That's why the '!(sev_status & MSR_AMD64_SEV_ENABLED)' works here.
> >
> > I would've never figured that out just from staring at the test. :-\
> >
> > > Basically, if you are bare-metal, it will return true. And it will only
> > > return true for machines that support SME and have the
> > > MSR_AMD64_SYSCFG_MEM_ENCRYPT bit set in SYS_CFG MSR because of where the
> > > 'cc_vendor = CC_VENDOR_AMD' assignment is. However, if you move the
> > > 'cc_vendor = CC_VENDOR_AMD' to before the if statement, then you will have
> > > the WBINVD called for any machine that supports SME, even if SME is not
> > > possible because the proper bit in the SYS_CFG MSR hasn't been set.
> > >
> > > I know what I'm trying to say, let me know if it is making sense...
> >
> > Yah, thanks for taking the time to explain.
> >
> > Here's an even more radical idea:
> >
> > Why not do WBINVD *unconditionally* on the CPU down path?
> >
> > - it is the opposite of a fast path, i.e., no one cares
> >
> > - it'll take care of every possible configuration without ugly logic
> >
> > - it wouldn't hurt to have the caches nice and coherent before going
> > down
> >
> > Hmmm.
>
> That's what I initially did, but errors were reported, see commit:
> f23d74f6c66c ("x86/mm: Rework wbinvd, hlt operation in stop_this_cpu()")

This changelog only mentions "Some issues". Do you know exactly what kind
issues did you see? Are these issues only appeared on SME enabled system or
other non-SME-capable systems too?

Because ...

>
> But that may be because of the issue solved by commit:
> 1f5e7eb7868e ("x86/smp: Make stop_other_cpus() more robust")

... the issue resolved in this commit seems to be hang.