Re: [PATCH] HWPOISON: add a pr_err message when forcibly send a sigbus

From: Helge Deller
Date: Mon Aug 21 2023 - 08:00:32 EST


On 8/21/23 12:50, Will Deacon wrote:
On Sat, Aug 19, 2023 at 06:22:12PM +0800, Shuai Xue wrote:
When a process tries to access a page that is already offline

How does this happen?

the kernel will send a sigbus signal with the BUS_MCEERR_AR code. This
signal is typically handled by a registered sigbus handler in the
process. However, if the process does not have a registered sigbus
handler, it is important for end users to be informed about what
happened.

To address this, add an error message similar to those implemented on
the x86, powerpc, and parisc platforms.

Signed-off-by: Shuai Xue <xueshuai@xxxxxxxxxxxxxxxxx>
---
arch/arm64/mm/fault.c | 2 ++
arch/parisc/mm/fault.c | 5 ++---
arch/x86/mm/fault.c | 3 +--
3 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
index 3fe516b32577..38e2186882bd 100644
--- a/arch/arm64/mm/fault.c
+++ b/arch/arm64/mm/fault.c
@@ -679,6 +679,8 @@ static int __kprobes do_page_fault(unsigned long far, unsigned long esr,
} else if (fault & (VM_FAULT_HWPOISON_LARGE | VM_FAULT_HWPOISON)) {
unsigned int lsb;

+ pr_err("MCE: Killing %s:%d due to hardware memory corruption fault at %lx\n",
+ current->comm, current->pid, far);
lsb = PAGE_SHIFT;
if (fault & VM_FAULT_HWPOISON_LARGE)
lsb = hstate_index_to_shift(VM_FAULT_GET_HINDEX(fault));

Hmm, I'm not convinced by this. We have 'show_unhandled_signals' already,
and there's plenty of code in memory-failure.c for handling poisoned pages
reported by e.g. GHES. I don't think dumping extra messages in dmesg from
the arch code really adds anything.

I added the parisc hunk in commit 606f95e42558 due to the memory fault injections by the LTP
testsuite (madvise07). Not sure if there were any other kernel messages when this happened.

Helge