Re: [PATCH 1/4] x86/coco: Add a new CC attribute to unify cache flush during kexec

From: Tom Lendacky
Date: Tue Feb 20 2024 - 17:30:28 EST


On 2/20/24 14:07, Huang, Kai wrote:
On Tue, 2024-02-20 at 08:47 -0600, Tom Lendacky wrote:
On 2/20/24 08:28, Borislav Petkov wrote:
On Mon, Feb 19, 2024 at 04:09:47PM -0600, Tom Lendacky wrote:
That's why the '!(sev_status & MSR_AMD64_SEV_ENABLED)' works here.

I would've never figured that out just from staring at the test. :-\

Basically, if you are bare-metal, it will return true. And it will only
return true for machines that support SME and have the
MSR_AMD64_SYSCFG_MEM_ENCRYPT bit set in SYS_CFG MSR because of where the
'cc_vendor = CC_VENDOR_AMD' assignment is. However, if you move the
'cc_vendor = CC_VENDOR_AMD' to before the if statement, then you will have
the WBINVD called for any machine that supports SME, even if SME is not
possible because the proper bit in the SYS_CFG MSR hasn't been set.

I know what I'm trying to say, let me know if it is making sense...

Yah, thanks for taking the time to explain.

Here's an even more radical idea:

Why not do WBINVD *unconditionally* on the CPU down path?

- it is the opposite of a fast path, i.e., no one cares

- it'll take care of every possible configuration without ugly logic

- it wouldn't hurt to have the caches nice and coherent before going
down

Hmmm.

That's what I initially did, but errors were reported, see commit:
f23d74f6c66c ("x86/mm: Rework wbinvd, hlt operation in stop_this_cpu()")

This changelog only mentions "Some issues". Do you know exactly what kind
issues did you see? Are these issues only appeared on SME enabled system or
other non-SME-capable systems too?

I believe the issues were that different Intel systems would hang or reset and it was bisected to that commit that added the WBINVD. It was a while ago, but I remember that they were similar to what the 1f5e7eb7868e commit ended up fixing, which was debugged because sometimes the WBINVD was still occasionally issued resulting in the following patch

9b040453d444 ("x86/smp: Dont access non-existing CPUID leaf")

It just means that if we go to an unconditional WBINVD, then we need to be careful.

Thanks,
Tom


Because ...


But that may be because of the issue solved by commit:
1f5e7eb7868e ("x86/smp: Make stop_other_cpus() more robust")

... the issue resolved in this commit seems to be hang.