[PATCH 0/6] x86, mce: machine check recovery for applications

From: Tony Luck
Date: Tue Jan 03 2012 - 15:17:29 EST


This series adds code to recognise the machine check signature for
a recoverable error in the data path (Advanced SKUs of "Sandy Bridge"
server processors are the first to be able to allow s/w recovery for
this case), save the required information in the machine check handler
and then call to the generic memory_failure() code to try for
graceful error recovery (sending SIGBUS to affected process(es)).

Updates since last version (December 15th)

Part1-4: unchanged

Part5: Changed stub function for CONFIG_MEMORY_FAILURE=n case to BUG_ON
if it is handed an MF_ACTION_REQUIRED case (this indicates an error in
severity calculation). Drop "Memory error recovered" message (enough
chatter already).

Part6: Only pass back an ACTION_REQUIRED severity to a kernel if it is built
with CONFIG_MEMORY_FAILURE=y (i.e. has the code to take the action).

Whole series is available in:

git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras.git mce-recovery

Tony Luck (6):
HWPOISON: clean up memory_failure() vs. __memory_failure()
HWPOISON: Add code to handle "action required" errors.
x86, mce: create helper function to save addr/misc when needed
x86, mce: Add mechanism to safely save information in MCE handler
x86, mce: handle "action required" errors
x86, mce: Recognise machine check bank signature for data path error

arch/x86/kernel/cpu/mcheck/mce-severity.c | 16 +++-
arch/x86/kernel/cpu/mcheck/mce.c | 179 ++++++++++++++++++++---------
drivers/base/memory.c | 2 +-
include/linux/mm.h | 4 +-
mm/hwpoison-inject.c | 4 +-
mm/madvise.c | 2 +-
mm/memory-failure.c | 96 ++++++++--------
7 files changed, 197 insertions(+), 106 deletions(-)

--
1.7.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/