Re: [PATCH v3 2/6] x86/microcode/core: Take a snapshot before and after applying microcode

From: Ashok Raj
Date: Tue Jan 03 2023 - 14:38:48 EST


On Tue, Jan 03, 2023 at 10:46:56AM -0800, Dave Hansen wrote:
> On 1/3/23 10:02, Ashok Raj wrote:
> > The kernel caches features about each CPU's features at boot in an
> > x86_capability[] structure. The microcode update takes one snapshot and
> > compares it with the saved copy at boot.
> >
> > However, the capabilities in the boot copy can be turned off as a result of
> > certain command line parameters or configuration restrictions. This can
> > cause a mismatch when comparing the values before and after the microcode
> > update.
> >
> > microcode_check() is called after an update to report any previously
> > cached CPUID bits might have changed due to the update.
> >
> > microcode_store_cpu_caps() basically stores the original CPU reported
> > values and not the OS modified values. This will avoid giving a false
> > warning even if no capabilities have changed.
> >
> > Ignore the capabilities recorded at boot. Take a new snapshot before the
> > update and compare with a snapshot after the update to eliminate the false
> > warning.
> ...
>
> It took me a moment to square this "Ignore the capabilities recorded at
> boot." statement with the continued existence of:
>
> memcpy(info->x86_capability, &boot_cpu_data.x86_capability, ...
>
> I think just adding "hardware" might help:
>
> Ignore all hardware capabilities recorded at boot.
>
> Or even adding one more sentence:
>
> Only consult the synthetic capabilities recorded at boot so that
> a simple memcmp() can be used for comparisons.

Rewritten this multiple times, but it still seems confusing, however hard I
try :-(

Its not consulting just the synthetic capabilties, but all real
capabilities reported by hardware including synthetic ones. For e.g.
if we turn off SGX, but HW still exposes it, the new snapshot will
have SGX but previous copy doesn't as it has gone through some more
filtering during enabling. That is what resulted in the miscompare.

Does this replacement sound better?

Ignore any changes to capabilities applied by the kernel during any
feature enabling and check against raw capabilities exposed by the
hardware


>
> > diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
> > index 387578049de0..ac2e67156b9b 100644
> > --- a/arch/x86/include/asm/processor.h
> > +++ b/arch/x86/include/asm/processor.h
> > @@ -697,6 +697,7 @@ bool xen_set_default_idle(void);
> > #endif
> >
> > void __noreturn stop_this_cpu(void *dummy);
> > +void microcode_store_cpu_caps(struct cpuinfo_x86 *info);
> > void microcode_check(struct cpuinfo_x86 *info);
> >
> > enum l1tf_mitigations {
> > diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
> > index b9c7529c920e..7c86c6fd07ae 100644
> > --- a/arch/x86/kernel/cpu/common.c
> > +++ b/arch/x86/kernel/cpu/common.c
> > @@ -2297,28 +2297,43 @@ void cpu_init_secondary(void)
> > #endif
> >
> > #ifdef CONFIG_MICROCODE_LATE_LOADING
> > +
> > +void microcode_store_cpu_caps(struct cpuinfo_x86 *info)
> > +{
> > + /* Reload CPUID max function as it might've changed. */
> > + info->cpuid_level = cpuid_eax(0);
> > +
> > + /*
> > + * Copy all capability leafs to pick up the synthetic ones so that
> > + * memcmp() below doesn't fail on that. The ones coming from CPUID will
> > + * get overwritten in get_cpu_cap().
> > + */
> > + memcpy(info->x86_capability, &boot_cpu_data.x86_capability,
> > + sizeof(info->x86_capability));
> > +
> > + get_cpu_cap(info);
> > +}
> > +
> > /*
> > * The microcode loader calls this upon late microcode load to recheck features,
> > * only when microcode has been updated. Caller holds microcode_mutex and CPU
> > * hotplug lock.
> > */
> > -void microcode_check(struct cpuinfo_x86 *info)
> > +void microcode_check(struct cpuinfo_x86 *orig)
> > {
> > - perf_check_microcode();
> > + struct cpuinfo_x86 info;
>
> 'info' is kinda a throwaway name. would this be better as:
>
> prev_info / new_info
>
> instead of:
>
> orig / info
>
> ?

Yes, that sounds better.

>
> > - /* Reload CPUID max function as it might've changed. */
> > - info->cpuid_level = cpuid_eax(0);
> > + perf_check_microcode();
> >
> > /*
> > * Copy all capability leafs to pick up the synthetic ones so that
> > * memcmp() below doesn't fail on that. The ones coming from CPUID will
> > * get overwritten in get_cpu_cap().
> > */
>
> This comment got copied to microcode_store_cpu_caps(). Does this
> instance still need to be here?

I think it still applies.. get_cpu_cap() copies all reported capabilities
and adds the synthetic ones too.

>
> > - memcpy(&info->x86_capability, &boot_cpu_data.x86_capability, sizeof(info->x86_capability));
> > -
> > - get_cpu_cap(info);
> > + microcode_store_cpu_caps(&info);
> >
> > - if (!memcmp(&info->x86_capability, &boot_cpu_data.x86_capability, sizeof(info->x86_capability)))
> > + if (!memcmp(&info.x86_capability, &orig->x86_capability,
> > + sizeof(info.x86_capability)))
> > return;
> >
> > pr_warn("x86/CPU: CPU features have changed after loading microcode, but might not take effect.\n");
> > diff --git a/arch/x86/kernel/cpu/microcode/core.c b/arch/x86/kernel/cpu/microcode/core.c
> > index d86a4f910a6b..14d9031ed68a 100644
> > --- a/arch/x86/kernel/cpu/microcode/core.c
> > +++ b/arch/x86/kernel/cpu/microcode/core.c
> > @@ -447,6 +447,13 @@ static int microcode_reload_late(void)
> > atomic_set(&late_cpus_in, 0);
> > atomic_set(&late_cpus_out, 0);
> >
> > + /*
> > + * Take a snapshot before the microcode update, so we can compare
> > + * them after the update is successful to check for any bits
> > + * changed.
> > + */
> > + microcode_store_cpu_caps(&info);
>
> A "we" snuck in there. How about this?
>
> /*
> * Take a snapshot before the microcode update. This enables
> * a later comparison to see any bits changed after an update.
> */
>
> I do think some better naming of 'info' here would be nice too.
> 'old_info' or 'prev_info' seem like good alternatives.

Sounds good.

Cheers,
Ashok