Re: [RESEND RFC PATCH] x86/bugs: Add "unknown" reporting for MMIO Stale Data

From: Pawan Gupta
Date: Fri Jul 29 2022 - 14:00:07 EST


On Thu, Jul 28, 2022 at 12:08:39PM -0700, Dave Hansen wrote:
> On 7/14/22 18:30, Pawan Gupta wrote:
> > Older CPUs beyond its Servicing period are not listed in the affected
> > processor list for MMIO Stale Data vulnerabilities. These CPUs currently
> > report "Not affected" in sysfs, which may not be correct.
>
> I'd kinda like to remove the talk about the "servicing period" in this
> patch. First, it's a moving target. CPUs can move in and out of their
> servicing period as Intel changes its mind, or simply as time passes.
>
> Intel could also totally choose to report a CPU as vulnerable *AND* have
> it be outside its service period. Or, some good Samaritan community
> member might be able to test a crusty old CPU and determine if it's
> vulnerable.
>
> > diff --git a/Documentation/admin-guide/hw-vuln/processor_mmio_stale_data.rst b/Documentation/admin-guide/hw-vuln/processor_mmio_stale_data.rst
> > index 9393c50b5afc..55524e0798da 100644
> > --- a/Documentation/admin-guide/hw-vuln/processor_mmio_stale_data.rst
> > +++ b/Documentation/admin-guide/hw-vuln/processor_mmio_stale_data.rst
> > @@ -230,6 +230,9 @@ The possible values in this file are:
> > * - 'Mitigation: Clear CPU buffers'
> > - The processor is vulnerable and the CPU buffer clearing mitigation is
> > enabled.
> > + * - 'Unknown: CPU is beyond its Servicing period'
> > + - The processor vulnerability status is unknown because it is
> > + out of Servicing period. Mitigation is not attempted.
>
> Unknown: Processor vendor did not provide vulnerability status.
>
> > If the processor is vulnerable then the following information is appended to
> > the above information:
> > diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
> > index 0dd04713434b..dd6e78d370bc 100644
> > --- a/arch/x86/kernel/cpu/bugs.c
> > +++ b/arch/x86/kernel/cpu/bugs.c
> > @@ -416,6 +416,7 @@ enum mmio_mitigations {
> > MMIO_MITIGATION_OFF,
> > MMIO_MITIGATION_UCODE_NEEDED,
> > MMIO_MITIGATION_VERW,
> > + MMIO_MITIGATION_UNKNOWN,
> > };
> >
> > /* Default mitigation for Processor MMIO Stale Data vulnerabilities */
> > @@ -426,12 +427,18 @@ static const char * const mmio_strings[] = {
> > [MMIO_MITIGATION_OFF] = "Vulnerable",
> > [MMIO_MITIGATION_UCODE_NEEDED] = "Vulnerable: Clear CPU buffers attempted, no microcode",
> > [MMIO_MITIGATION_VERW] = "Mitigation: Clear CPU buffers",
> > + [MMIO_MITIGATION_UNKNOWN] = "Unknown: CPU is beyond its servicing period",
> > };
>
> Let's just say:
>
> Unknown: no mitigations
>
> or even just: "Unknown"
>
> > static void __init mmio_select_mitigation(void)
> > {
> > u64 ia32_cap;
> >
> > + if (mmio_stale_data_unknown()) {
> > + mmio_mitigation = MMIO_MITIGATION_UNKNOWN;
> > + return;
> > + }
> > +
> > if (!boot_cpu_has_bug(X86_BUG_MMIO_STALE_DATA) ||
> > cpu_mitigations_off()) {
> > mmio_mitigation = MMIO_MITIGATION_OFF;
> > @@ -1638,6 +1645,7 @@ void cpu_bugs_smt_update(void)
> > pr_warn_once(MMIO_MSG_SMT);
> > break;
> > case MMIO_MITIGATION_OFF:
> > + case MMIO_MITIGATION_UNKNOWN:
> > break;
> > }
> >
> > @@ -2235,7 +2243,8 @@ static ssize_t tsx_async_abort_show_state(char *buf)
> >
> > static ssize_t mmio_stale_data_show_state(char *buf)
> > {
> > - if (mmio_mitigation == MMIO_MITIGATION_OFF)
> > + if (mmio_mitigation == MMIO_MITIGATION_OFF ||
> > + mmio_mitigation == MMIO_MITIGATION_UNKNOWN)
> > return sysfs_emit(buf, "%s\n", mmio_strings[mmio_mitigation]);
> >
> > if (boot_cpu_has(X86_FEATURE_HYPERVISOR)) {
> > diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
> > index 736262a76a12..82088410870e 100644
> > --- a/arch/x86/kernel/cpu/common.c
> > +++ b/arch/x86/kernel/cpu/common.c
> > @@ -1286,6 +1286,22 @@ static bool arch_cap_mmio_immune(u64 ia32_cap)
> > ia32_cap & ARCH_CAP_SBDR_SSDP_NO);
> > }
> >
> > +bool __init mmio_stale_data_unknown(void)
> > +{
> > + u64 ia32_cap = x86_read_arch_cap_msr();
> > +
> > + if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL)
> > + return false;
>
> Let's say why Intel is the special snowflake. Maybe:
>
> /*
> * Intel does not document vulnerability information for old
> * CPUs. This means that only Intel CPUs can have unknown
> * vulnerability state.
> */
>
> > + /*
> > + * CPU vulnerability is unknown when, hardware doesn't set the
> > + * immunity bits and CPU is not in the known affected list.
> > + */
> > + if (!cpu_matches(cpu_vuln_blacklist, MMIO) &&
> > + !arch_cap_mmio_immune(ia32_cap))
> > + return true;
> > + return false;
> > +}
> > +
> > static void __init cpu_set_bug_bits(struct cpuinfo_x86 *c)
> > {
> > u64 ia32_cap = x86_read_arch_cap_msr();
> > @@ -1349,14 +1365,8 @@ static void __init cpu_set_bug_bits(struct cpuinfo_x86 *c)
> > cpu_matches(cpu_vuln_blacklist, SRBDS | MMIO_SBDS))
> > setup_force_cpu_bug(X86_BUG_SRBDS);
> >
> > - /*
> > - * Processor MMIO Stale Data bug enumeration
> > - *
> > - * Affected CPU list is generally enough to enumerate the vulnerability,
> > - * but for virtualization case check for ARCH_CAP MSR bits also, VMM may
> > - * not want the guest to enumerate the bug.
> > - */
> > - if (cpu_matches(cpu_vuln_blacklist, MMIO) &&
> > + /* Processor MMIO Stale Data bug enumeration */
> > + if (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL &&
> > !arch_cap_mmio_immune(ia32_cap))
> > setup_force_cpu_bug(X86_BUG_MMIO_STALE_DATA);
>
> Yeah, this is all looking a little clunky.
>
> Maybe we just need a third state of cpu_has_bug() for all this and we
> shouldn't try cramming it in the MMIO-specific code and diluting the
> specificity of boot_cpu_has_bug().
>
> Then the selection logic becomes simple:
>
> if (!arch_cap_mmio_immune(ia32_cap))) {
> if (cpu_matches(cpu_vuln_blacklist, MMIO))
> setup_force_cpu_bug(X86_BUG_MMIO_STALE_DATA);
> else if (x86_vendor == X86_VENDOR_INTEL)
> setup_force_unknown_bug(X86_BUG_MMIO...);
> }
>
> ... and then spit out the "Unknown" in the common code, just like the
> treatment "Not affected" gets.
>
> static ssize_t cpu_show_common(...)
> {
> if (!boot_cpu_has_bug(bug))
> return sprintf(buf, "Not affected\n");
> +
> + if (!boot_cpu_unknown_bug(bug))
> + return sprintf(buf, "Unknown\n");
>
> Thoughts?

Sounds good. Similar to this Borislav suggested to add
X86_BUG_MMIO_UNKNOWN. I will see if I can combine both approaches.