Re: [RESEND RFC PATCH] x86/bugs: Add "unknown" reporting for MMIO Stale Data

From: Dave Hansen
Date: Thu Jul 28 2022 - 15:08:44 EST


On 7/14/22 18:30, Pawan Gupta wrote:
> Older CPUs beyond its Servicing period are not listed in the affected
> processor list for MMIO Stale Data vulnerabilities. These CPUs currently
> report "Not affected" in sysfs, which may not be correct.

I'd kinda like to remove the talk about the "servicing period" in this
patch. First, it's a moving target. CPUs can move in and out of their
servicing period as Intel changes its mind, or simply as time passes.

Intel could also totally choose to report a CPU as vulnerable *AND* have
it be outside its service period. Or, some good Samaritan community
member might be able to test a crusty old CPU and determine if it's
vulnerable.

> diff --git a/Documentation/admin-guide/hw-vuln/processor_mmio_stale_data.rst b/Documentation/admin-guide/hw-vuln/processor_mmio_stale_data.rst
> index 9393c50b5afc..55524e0798da 100644
> --- a/Documentation/admin-guide/hw-vuln/processor_mmio_stale_data.rst
> +++ b/Documentation/admin-guide/hw-vuln/processor_mmio_stale_data.rst
> @@ -230,6 +230,9 @@ The possible values in this file are:
> * - 'Mitigation: Clear CPU buffers'
> - The processor is vulnerable and the CPU buffer clearing mitigation is
> enabled.
> + * - 'Unknown: CPU is beyond its Servicing period'
> + - The processor vulnerability status is unknown because it is
> + out of Servicing period. Mitigation is not attempted.

Unknown: Processor vendor did not provide vulnerability status.

> If the processor is vulnerable then the following information is appended to
> the above information:
> diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
> index 0dd04713434b..dd6e78d370bc 100644
> --- a/arch/x86/kernel/cpu/bugs.c
> +++ b/arch/x86/kernel/cpu/bugs.c
> @@ -416,6 +416,7 @@ enum mmio_mitigations {
> MMIO_MITIGATION_OFF,
> MMIO_MITIGATION_UCODE_NEEDED,
> MMIO_MITIGATION_VERW,
> + MMIO_MITIGATION_UNKNOWN,
> };
>
> /* Default mitigation for Processor MMIO Stale Data vulnerabilities */
> @@ -426,12 +427,18 @@ static const char * const mmio_strings[] = {
> [MMIO_MITIGATION_OFF] = "Vulnerable",
> [MMIO_MITIGATION_UCODE_NEEDED] = "Vulnerable: Clear CPU buffers attempted, no microcode",
> [MMIO_MITIGATION_VERW] = "Mitigation: Clear CPU buffers",
> + [MMIO_MITIGATION_UNKNOWN] = "Unknown: CPU is beyond its servicing period",
> };

Let's just say:

Unknown: no mitigations

or even just: "Unknown"

> static void __init mmio_select_mitigation(void)
> {
> u64 ia32_cap;
>
> + if (mmio_stale_data_unknown()) {
> + mmio_mitigation = MMIO_MITIGATION_UNKNOWN;
> + return;
> + }
> +
> if (!boot_cpu_has_bug(X86_BUG_MMIO_STALE_DATA) ||
> cpu_mitigations_off()) {
> mmio_mitigation = MMIO_MITIGATION_OFF;
> @@ -1638,6 +1645,7 @@ void cpu_bugs_smt_update(void)
> pr_warn_once(MMIO_MSG_SMT);
> break;
> case MMIO_MITIGATION_OFF:
> + case MMIO_MITIGATION_UNKNOWN:
> break;
> }
>
> @@ -2235,7 +2243,8 @@ static ssize_t tsx_async_abort_show_state(char *buf)
>
> static ssize_t mmio_stale_data_show_state(char *buf)
> {
> - if (mmio_mitigation == MMIO_MITIGATION_OFF)
> + if (mmio_mitigation == MMIO_MITIGATION_OFF ||
> + mmio_mitigation == MMIO_MITIGATION_UNKNOWN)
> return sysfs_emit(buf, "%s\n", mmio_strings[mmio_mitigation]);
>
> if (boot_cpu_has(X86_FEATURE_HYPERVISOR)) {
> diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
> index 736262a76a12..82088410870e 100644
> --- a/arch/x86/kernel/cpu/common.c
> +++ b/arch/x86/kernel/cpu/common.c
> @@ -1286,6 +1286,22 @@ static bool arch_cap_mmio_immune(u64 ia32_cap)
> ia32_cap & ARCH_CAP_SBDR_SSDP_NO);
> }
>
> +bool __init mmio_stale_data_unknown(void)
> +{
> + u64 ia32_cap = x86_read_arch_cap_msr();
> +
> + if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL)
> + return false;

Let's say why Intel is the special snowflake. Maybe:

/*
* Intel does not document vulnerability information for old
* CPUs. This means that only Intel CPUs can have unknown
* vulnerability state.
*/

> + /*
> + * CPU vulnerability is unknown when, hardware doesn't set the
> + * immunity bits and CPU is not in the known affected list.
> + */
> + if (!cpu_matches(cpu_vuln_blacklist, MMIO) &&
> + !arch_cap_mmio_immune(ia32_cap))
> + return true;
> + return false;
> +}
> +
> static void __init cpu_set_bug_bits(struct cpuinfo_x86 *c)
> {
> u64 ia32_cap = x86_read_arch_cap_msr();
> @@ -1349,14 +1365,8 @@ static void __init cpu_set_bug_bits(struct cpuinfo_x86 *c)
> cpu_matches(cpu_vuln_blacklist, SRBDS | MMIO_SBDS))
> setup_force_cpu_bug(X86_BUG_SRBDS);
>
> - /*
> - * Processor MMIO Stale Data bug enumeration
> - *
> - * Affected CPU list is generally enough to enumerate the vulnerability,
> - * but for virtualization case check for ARCH_CAP MSR bits also, VMM may
> - * not want the guest to enumerate the bug.
> - */
> - if (cpu_matches(cpu_vuln_blacklist, MMIO) &&
> + /* Processor MMIO Stale Data bug enumeration */
> + if (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL &&
> !arch_cap_mmio_immune(ia32_cap))
> setup_force_cpu_bug(X86_BUG_MMIO_STALE_DATA);

Yeah, this is all looking a little clunky.

Maybe we just need a third state of cpu_has_bug() for all this and we
shouldn't try cramming it in the MMIO-specific code and diluting the
specificity of boot_cpu_has_bug().

Then the selection logic becomes simple:

if (!arch_cap_mmio_immune(ia32_cap))) {
if (cpu_matches(cpu_vuln_blacklist, MMIO))
setup_force_cpu_bug(X86_BUG_MMIO_STALE_DATA);
else if (x86_vendor == X86_VENDOR_INTEL)
setup_force_unknown_bug(X86_BUG_MMIO...);
}

... and then spit out the "Unknown" in the common code, just like the
treatment "Not affected" gets.

static ssize_t cpu_show_common(...)
{
if (!boot_cpu_has_bug(bug))
return sprintf(buf, "Not affected\n");
+
+ if (!boot_cpu_unknown_bug(bug))
+ return sprintf(buf, "Unknown\n");

Thoughts?