[PATCH tip:ras/core 3/3] x86/MCE: Make correctable error detection look at the Deferred bit

From: Borislav Petkov
Date: Mon Dec 18 2017 - 06:37:35 EST


From: Yazen Ghannam <yazen.ghannam@xxxxxxx>

AMD systems may log Deferred errors. These are errors that are uncorrected
but which do not need immediate action. The MCA_STATUS[UC] bit may not be
set for Deferred errors.

Flag the error as not correctable when MCA_STATUS[Deferred] is set and
do not feed it into the Correctable Errors Collector.

Signed-off-by: Yazen Ghannam <yazen.ghannam@xxxxxxx>
Cc: stable@xxxxxxxxxxxxxxx # 4.13.x
Cc: Tony Luck <tony.luck@xxxxxxxxx>
Cc: linux-edac <linux-edac@xxxxxxxxxxxxxxx>
Cc: x86-ml <x86@xxxxxxxxxx>
Link: http://lkml.kernel.org/r/20171212165143.27475-1-Yazen.Ghannam@xxxxxxx
[ Massage commit message. ]
Signed-off-by: Borislav Petkov <bp@xxxxxxx>
---
arch/x86/kernel/cpu/mcheck/mce.c | 13 ++++++++++++-
1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 321c7a80be66..1b2c11473376 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -528,6 +528,17 @@ bool mce_is_memory_error(struct mce *m)
}
EXPORT_SYMBOL_GPL(mce_is_memory_error);

+static bool mce_is_correctable(struct mce *m)
+{
+ if (m->cpuvendor == X86_VENDOR_AMD && m->status & MCI_STATUS_DEFERRED)
+ return false;
+
+ if (m->status & MCI_STATUS_UC)
+ return false;
+
+ return true;
+}
+
static bool cec_add_mce(struct mce *m)
{
if (!m)
@@ -535,7 +546,7 @@ static bool cec_add_mce(struct mce *m)

/* We eat only correctable DRAM errors with usable addresses. */
if (mce_is_memory_error(m) &&
- !(m->status & MCI_STATUS_UC) &&
+ mce_is_correctable(m) &&
mce_usable_address(m))
if (!cec_add_elem(m->addr >> PAGE_SHIFT))
return true;
--
2.13.0