Re: [PATCH 1/4] x86/coco: Add a new CC attribute to unify cache flush during kexec

From: Huang, Kai
Date: Mon Feb 19 2024 - 22:12:42 EST


On Mon, 2024-02-19 at 17:16 +0100, Borislav Petkov wrote:
> On Wed, Jan 31, 2024 at 11:31:53AM +0000, Huang, Kai wrote:
> > From: Kai Huang <kai.huang@xxxxxxxxx>
> >
> > Currently on AMD SME platforms, during kexec() caches are flushed
> > manually before jumping to the new kernel due to memory encryption.
> > Intel TDX needs to flush cachelines of TDX private memory before
> > jumping to the second kernel too, otherwise they may silently corrupt
> > the new kernel.
> >
> > Instead of sprinkling both AMD and Intel's specific checks around,
> > introduce a new CC_ATTR_HOST_MEM_INCOHERENT attribute to unify both
> > Intel and AMD, and simplify the logic:
> >
> > Could the old kernel leave incoherent caches around?
>
> "Is it possible that the kernel could leave caches in incoherent state?"

Will do. Thanks.

>
> > If so, do WBINVD.
> >
> > Convert the AMD SME to use this new CC attribute.
>
> > A later patch will
> > utilize this new attribute for Intel TDX too.
>
> Yeah, once those are all in git, the concept of "subsequent patch"
> becomes ambiguous depending on how you're sorting them. So try to read
> that commit message out of the context of all those "subsequent patches"
> and see if it still makes sense.

Right. Makes sense.

>
> IOW, you should strive for your commit messages to make sense on their
> own, without referencing other patches.
>
> In this particular case, that "later patch" can go.

Will remove the "later patch" sentence. Thanks.

>
> > Specifically, AMD SME flushes caches at two places: 1) stop_this_cpu();
> > 2) relocate_kernel(). stop_this_cpu() checks the CPUID directly to do
> > WBINVD for the reason that the current kernel's SME enabling status may
> > not match the new kernel's choice. However the relocate_kernel() only
> > does the WBINVD when the current kernel has enabled SME for the reason
> > that the new kernel is always placed in an "unencrypted" area.
> >
> > To simplify the logic, for AMD SME change to always use the way that is
> > done in stop_this_cpu(). This will cause an additional WBINVD in
> > relocate_kernel() when the current kernel hasn't enabled SME (e.g.,
> > disabled by kernel command line), but this is acceptable for the sake of
> > having less complicated code (see [1] for the relevant discussion).
> >
> > Note currently the kernel only advertises CC vendor for AMD SME when SME
> > is actually enabled by the kernel. To always advertise the new
> > CC_ATTR_HOST_MEM_INCOHERENT regardless of the kernel's SME enabling
> > status, change to set CC vendor as long as the hardware has enabled SME.
> >
> > Note "advertising CC_ATTR_HOST_MEM_INCOHERENT when the hardware has
> > enabled SME" is still different from "checking the CPUID" (the way that
> > is done in stop_this_cpu()), but technically the former also serves the
> > purpose and is actually more accurate.
> >
> > Such change allows sme_me_mask to be 0 while CC vendor reports as AMD.
> > But this doesn't impact other CC attributes on AMD platforms, nor does
> > it impact the cc_mkdec()/cc_mkenc().
> >
> > [1] https://lore.kernel.org/lkml/cbc9c527-17e5-4a63-80fe-85451394cc7c@xxxxxxx/
> >
> > Suggested-by: Dave Hansen <dave.hansen@xxxxxxxxx>
> > Signed-off-by: Kai Huang <kai.huang@xxxxxxxxx>
> > ---
> > arch/x86/coco/core.c | 13 +++++++++++++
> > arch/x86/kernel/machine_kexec_64.c | 2 +-
> > arch/x86/kernel/process.c | 14 +++-----------
> > arch/x86/mm/mem_encrypt_identity.c | 11 ++++++++++-
> > include/linux/cc_platform.h | 15 +++++++++++++++
> > 5 files changed, 42 insertions(+), 13 deletions(-)
> >
> > diff --git a/arch/x86/coco/core.c b/arch/x86/coco/core.c
> > index eeec9986570e..8d6d727e6e18 100644
> > --- a/arch/x86/coco/core.c
> > +++ b/arch/x86/coco/core.c
> > @@ -72,6 +72,19 @@ static bool noinstr amd_cc_platform_has(enum cc_attr attr)
> > case CC_ATTR_HOST_MEM_ENCRYPT:
> > return sme_me_mask && !(sev_status & MSR_AMD64_SEV_ENABLED);
> >
> > + case CC_ATTR_HOST_MEM_INCOHERENT:
> > + /*
> > + * CC_ATTR_HOST_MEM_INCOHERENT represents whether SME has
> > + * enabled on the platform regardless whether the kernel
> > + * has actually enabled the SME.
> > +
>
> "represents whether SME has [been] enabled ... regardless whether the
> kernel has enabled SME"?!?
>
> I think this needs to be:
>
> diff --git a/arch/x86/coco/core.c b/arch/x86/coco/core.c
> index d07be9d05cd0..e3653465e532 100644
> --- a/arch/x86/coco/core.c
> +++ b/arch/x86/coco/core.c
> @@ -67,6 +67,13 @@ static bool noinstr amd_cc_platform_has(enum cc_attr attr)
>
> switch (attr) {
> case CC_ATTR_MEM_ENCRYPT:
> +
> + /*
> + * CC_ATTR_HOST_MEM_INCOHERENT represents whether SME has
> + * enabled on the platform regardless whether the kernel
> + * has actually enabled the SME.
> + */
> + case CC_ATTR_HOST_MEM_INCOHERENT:
> return sme_me_mask;

As Tom pointed out this will return false when kernel doesn't active SME due to
command line, and WBINVD won't be done.

>
> case CC_ATTR_HOST_MEM_ENCRYPT:
>
>
> > + return !(sev_status & MSR_AMD64_SEV_ENABLED);
> > +
> > + /*
> > + * For all CC_ATTR_GUEST_* there's no need to check sme_me_mask
> > + * as it must be true when there's any SEV enable bit set in
> > + * sev_status.
> > + */
>
> Superfluous comment.

Will remove.

>
> > case CC_ATTR_GUEST_MEM_ENCRYPT:
> > return sev_status & MSR_AMD64_SEV_ENABLED;
> >
> > diff --git a/arch/x86/kernel/machine_kexec_64.c b/arch/x86/kernel/machine_kexec_64.c
> > index bc0a5348b4a6..c9c6974e2e9c 100644
> > --- a/arch/x86/kernel/machine_kexec_64.c
> > +++ b/arch/x86/kernel/machine_kexec_64.c
> > @@ -358,7 +358,7 @@ void machine_kexec(struct kimage *image)
> > (unsigned long)page_list,
> > image->start,
> > image->preserve_context,
> > - cc_platform_has(CC_ATTR_HOST_MEM_ENCRYPT));
> > + cc_platform_has(CC_ATTR_HOST_MEM_INCOHERENT));
> >
> > #ifdef CONFIG_KEXEC_JUMP
> > if (image->preserve_context)
> > diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
> > index ab49ade31b0d..2c7e8d9889c0 100644
> > --- a/arch/x86/kernel/process.c
> > +++ b/arch/x86/kernel/process.c
> > @@ -813,18 +813,10 @@ void __noreturn stop_this_cpu(void *dummy)
> > mcheck_cpu_clear(c);
> >
> > /*
> > - * Use wbinvd on processors that support SME. This provides support
> > - * for performing a successful kexec when going from SME inactive
> > - * to SME active (or vice-versa). The cache must be cleared so that
> > - * if there are entries with the same physical address, both with and
> > - * without the encryption bit, they don't race each other when flushed
> > - * and potentially end up with the wrong entry being committed to
> > - * memory.
> > - *
> > - * Test the CPUID bit directly because the machine might've cleared
> > - * X86_FEATURE_SME due to cmdline options.
> > + * Use wbinvd on processors that the first kernel *could*
> > + * potentially leave incoherent cachelines.
>
> No need for that comment anymore - people can grep for
> CC_ATTR_HOST_MEM_INCOHERENT's definition simply.

Will remove. Thanks.

>
> > */
> > - if (c->extended_cpuid_level >= 0x8000001f && (cpuid_eax(0x8000001f) & BIT(0)))
> > + if (cc_platform_has(CC_ATTR_HOST_MEM_INCOHERENT))
> > native_wbinvd();
> >
> > /*
> > diff --git a/arch/x86/mm/mem_encrypt_identity.c b/arch/x86/mm/mem_encrypt_identity.c
> > index 7f72472a34d6..87e4fddab770 100644
> > --- a/arch/x86/mm/mem_encrypt_identity.c
> > +++ b/arch/x86/mm/mem_encrypt_identity.c
> > @@ -570,9 +570,19 @@ void __init sme_enable(struct boot_params *bp)
> > msr = __rdmsr(MSR_AMD64_SYSCFG);
> > if (!(msr & MSR_AMD64_SYSCFG_MEM_ENCRYPT))
> > return;
> > +
> > + /*
> > + * Always set CC vendor when the platform has SME enabled
> > + * regardless whether the kernel will actually activates the
> > + * SME or not. This reports the CC_ATTR_HOST_MEM_INCOHERENT
> > + * being true as long as the platform has SME enabled so that
> > + * stop_this_cpu() can do necessary WBINVD during kexec().
> > + */
> > + cc_vendor = CC_VENDOR_AMD;
> > } else {
> > /* SEV state cannot be controlled by a command line option */
> > sme_me_mask = me_mask;
> > + cc_vendor = CC_VENDOR_AMD;
> > goto out;
> > }
> >
>
> So you can't put it before the if - just slap it in both branches. Geez!

I think we can. Please see my reply to Tom's email.

>
> diff --git a/arch/x86/mm/mem_encrypt_identity.c b/arch/x86/mm/mem_encrypt_identity.c
> index 0166ab1780cc..1e4566cc233f 100644
> --- a/arch/x86/mm/mem_encrypt_identity.c
> +++ b/arch/x86/mm/mem_encrypt_identity.c
> @@ -549,6 +549,8 @@ void __init sme_enable(struct boot_params *bp)
> if (snp && !(msr & MSR_AMD64_SEV_SNP_ENABLED))
> snp_abort();
>
> + cc_vendor = CC_VENDOR_AMD;
> +
> /* Check if memory encryption is enabled */
> if (feature_mask == AMD_SME_BIT) {
> /*
> @@ -597,6 +599,5 @@ void __init sme_enable(struct boot_params *bp)
> out:
> RIP_REL_REF(sme_me_mask) = me_mask;
> physical_mask &= ~me_mask;
> - cc_vendor = CC_VENDOR_AMD;
> cc_set_mask(me_mask);
> }
>
> Btw, pls do your patches ontop of tip/master as other patches in there
> are touching this file.

Yeah will do. I believe this series was generated based on tip/master but this
was 3 weeks ago.

>
> > @@ -608,7 +618,6 @@ void __init sme_enable(struct boot_params *bp)
> > out:
> > if (sme_me_mask) {
> > physical_mask &= ~sme_me_mask;
> > - cc_vendor = CC_VENDOR_AMD;
> > cc_set_mask(sme_me_mask);
> > }
> > }
> > diff --git a/include/linux/cc_platform.h b/include/linux/cc_platform.h
> > index cb0d6cd1c12f..2f7273596102 100644
> > --- a/include/linux/cc_platform.h
> > +++ b/include/linux/cc_platform.h
> > @@ -42,6 +42,21 @@ enum cc_attr {
> > */
> > CC_ATTR_HOST_MEM_ENCRYPT,
> >
>
> This goes to the end of the enum.
>
> > + /**
> > + * @CC_ATTR_HOST_MEM_INCOHERENT: Host memory encryption can be
> > + * incoherent
>
> "...can leave caches in an incoherent state."

Will do. And I'll move this CC_ATTR_HOST_MEM_INCOHERENT to the end of the enum
if I understand you correctly.

>
> > + *
> > + * The platform/OS is running as a bare-metal system or a hypervisor.
> > + * The memory encryption engine might have left non-cache-coherent
> > + * data in the caches that needs to be flushed.
> > + *
> > + * Use this in places where the cache coherency of the memory matters
> > + * but the encryption status does not.
> > + *
> > + * Includes all systems that set CC_ATTR_HOST_MEM_ENCRYPT.
>
> If that is the case, why do you even need a new CC_ATTR define?
>
> Might as well do:
>
> if (cc_platform_has(CC_ATTR_MEM_ENCRYPT))
> native_wbinvd();
>
> ?

As Tom pointed out using the CC_ATTR_MEM_ENCRYPT will miss WBINVD when kernel
disables SME by commandline.

>
> > + */
> > + CC_ATTR_HOST_MEM_INCOHERENT,
> > +
>
>