[Patch V0] x86, mce: Don't clear global error reporting banks during cpu_offline

From: Ashok Raj
Date: Thu Sep 03 2015 - 13:18:32 EST


During CPU offline, or during suspend/resume operations, its not safe to
clear MCi_CTL. These MSR's are either thread scoped (meaning private to
thread), or core scoped (private to threads in that core only), or socket
scope i.e visible and controllable from all threads in the socket.

When we turn off during CPU_OFFLINE, just offlining a single CPU will
stop signaling for all the socket wide resources, such as LLC, iMC for e.g.

It is true for Intel CPU's. But there seems some history that other processors
may require to turn these off during every CPU offline.

Intel Secure Guard eXtentions will be disabled when these controls are cleared
from a security perspective. This patch enables SGX to work across
suspend/resume.

- Consolidated some code to use sharing
- Minor changes to some prototypes to fit usage.
- Left handling same for non-Intel CPU models to avoid any unknown regressions.

Signed-off-by: Ashok Raj <ashok.raj@xxxxxxxxx>
Reviewed-by: Tony Luck <tony.luck@xxxxxxxxx>
Tested-by: Serge Ayoun <serge.ayoun@xxxxxxxxx>
---
arch/x86/kernel/cpu/mcheck/mce.c | 38 ++++++++++++++++++++++++++++----------
1 file changed, 28 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index d350858..5498a79 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -2100,7 +2100,7 @@ int __init mcheck_init(void)
* Disable machine checks on suspend and shutdown. We can't really handle
* them later.
*/
-static int mce_disable_error_reporting(void)
+static void mce_disable_error_reporting(void)
{
int i;

@@ -2110,17 +2110,40 @@ static int mce_disable_error_reporting(void)
if (b->init)
wrmsrl(MSR_IA32_MCx_CTL(i), 0);
}
- return 0;
+ return;
+}
+
+static void _vendor_disable_error_reporting(void)
+{
+ struct cpuinfo_x86 *c = &boot_cpu_data;
+
+ switch (c->x86_vendor) {
+ case X86_VENDOR_INTEL:
+ /*
+ * Don't clear on Intel CPU's. Some of these MSR's are
+ * socket wide. Disabling them for just a single cpu offline
+ * is bad, since it will inhibit reporting for all shared
+ * resources.. such as LLC, iMC for e.g.
+ */
+ break;
+ default:
+ /*
+ * Disble MCE reporting for all other CPU Vendor.
+ * Don't want to break functionality on those
+ */
+ mce_disable_error_reporting();
+ }
}

static int mce_syscore_suspend(void)
{
- return mce_disable_error_reporting();
+ _vendor_disable_error_reporting();
+ return 0;
}

static void mce_syscore_shutdown(void)
{
- mce_disable_error_reporting();
+ _vendor_disable_error_reporting();
}

/*
@@ -2400,19 +2423,14 @@ static void mce_device_remove(unsigned int cpu)
static void mce_disable_cpu(void *h)
{
unsigned long action = *(unsigned long *)h;
- int i;

if (!mce_available(raw_cpu_ptr(&cpu_info)))
return;

if (!(action & CPU_TASKS_FROZEN))
cmci_clear();
- for (i = 0; i < mca_cfg.banks; i++) {
- struct mce_bank *b = &mce_banks[i];

- if (b->init)
- wrmsrl(MSR_IA32_MCx_CTL(i), 0);
- }
+ _vendor_disable_error_reporting();
}

static void mce_reenable_cpu(void *h)
--
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/