Re: Fwd: [PATCH] x86, MCE, AMD: save IA32_MCi_STATUS before machine_check_poll() resets it

From: Borislav Petkov
Date: Tue Oct 21 2014 - 16:29:14 EST


On Thu, Oct 09, 2014 at 02:01:06PM -0500, Aravind Gopalakrishnan wrote:
> I actually agree with this approach. So no argument:)

Ok, thanks, here's a patch.

Btw, I'm pushing the whole queue to a ras-for-3.19 branch at
https://git.kernel.org/cgit/linux/kernel/git/bp/bp.git if you'd like to
take a look and see whether we haven't forgotten anything before I send
it to tip guys.

Thanks.

---
From: Borislav Petkov <bp@xxxxxxx>
Subject: [PATCH] x86, MCE, AMD: Drop software-defined bank in error thresholding

Aravind had the good question about why we're assigning a
software-defined bank when reporting error thresholding errors instead
of simply using the bank which reports the last error causing the
overflow.

Digging through git history, it pointed to

95268664390b ("[PATCH] x86_64: mce_amd support for family 0x10 processors")

which added that functionality. The problem with this, however, is that
tools don't know about software-defined banks and get puzzled. So drop
that K8_MCE_THRESHOLD_BASE and simply use the hw bank reporting the
thresholding interrupt.

Save us a couple of MSR reads while at it.

Reported-by: Aravind Gopalakrishnan <aravind.gopalakrishnan@xxxxxxx>
Link: https://lkml.kernel.org/r/5435B206.60402@xxxxxxx
Signed-off-by: Borislav Petkov <bp@xxxxxxx>
---
arch/x86/include/asm/mce.h | 1 -
arch/x86/kernel/cpu/mcheck/mce_amd.c | 5 ++---
2 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index 958b90f761e5..276392f121fb 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -78,7 +78,6 @@
/* Software defined banks */
#define MCE_EXTENDED_BANK 128
#define MCE_THERMAL_BANK (MCE_EXTENDED_BANK + 0)
-#define K8_MCE_THRESHOLD_BASE (MCE_EXTENDED_BANK + 1)

#define MCE_LOG_LEN 32
#define MCE_LOG_SIGNATURE "MACHINECHECK"
diff --git a/arch/x86/kernel/cpu/mcheck/mce_amd.c b/arch/x86/kernel/cpu/mcheck/mce_amd.c
index 9af7bd74828b..6606523ff1c1 100644
--- a/arch/x86/kernel/cpu/mcheck/mce_amd.c
+++ b/arch/x86/kernel/cpu/mcheck/mce_amd.c
@@ -318,10 +318,9 @@ static void amd_threshold_interrupt(void)

log:
mce_setup(&m);
- rdmsrl(MSR_IA32_MCG_STATUS, m.mcgstatus);
- rdmsrl(address, m.misc);
rdmsrl(MSR_IA32_MCx_STATUS(bank), m.status);
- m.bank = K8_MCE_THRESHOLD_BASE + bank * NR_BLOCKS + block;
+ m.misc = ((u64)high << 32) | low;
+ m.bank = bank;
mce_log(&m);

wrmsrl(MSR_IA32_MCx_STATUS(bank), 0);
--
2.0.0


--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/