Re: mce: Add errata workaround for Skylake SKX37

From: Luck, Tony
Date: Tue Nov 02 2021 - 15:55:43 EST


On Fri, Oct 29, 2021 at 04:57:59PM -0400, Dave Jones wrote:
> Errata SKX37 is word-for-word identical to the other errata listed in
> this workaround. I happened to notice this after investigating a CMCI
> storm on a Skylake host. While I can't confirm this was the root cause,
> spurious corrected errors does sound like a likely suspect.
>
> Signed-off-by: Dave Jones <davej@xxxxxxxxxxxxxxxxx>

Needs:

Fixes: 2976908e4198 ("x86/mce: Do not log spurious corrected mce errors")
Cc: <stable@xxxxxxxxxxxxxxx>

otherwise:

Reviewed-by: Tony Luck <tony.luck@xxxxxxxxx>

>
> diff --git arch/x86/kernel/cpu/mce/intel.c arch/x86/kernel/cpu/mce/intel.c
> index acfd5d9f93c6..bb9a46a804bf 100644
> --- arch/x86/kernel/cpu/mce/intel.c
> +++ arch/x86/kernel/cpu/mce/intel.c
> @@ -547,12 +547,13 @@ bool intel_filter_mce(struct mce *m)
> {
> struct cpuinfo_x86 *c = &boot_cpu_data;
>
> - /* MCE errata HSD131, HSM142, HSW131, BDM48, and HSM142 */
> + /* MCE errata HSD131, HSM142, HSW131, BDM48, HSM142 and SKX37 */
> if ((c->x86 == 6) &&
> ((c->x86_model == INTEL_FAM6_HASWELL) ||
> (c->x86_model == INTEL_FAM6_HASWELL_L) ||
> (c->x86_model == INTEL_FAM6_BROADWELL) ||
> - (c->x86_model == INTEL_FAM6_HASWELL_G)) &&
> + (c->x86_model == INTEL_FAM6_HASWELL_G) ||
> + (c->x86_model == INTEL_FAM6_SKYLAKE_X)) &&
> (m->bank == 0) &&
> ((m->status & 0xa0000000ffffffff) == 0x80000000000f0005))
> return true;