[PATCH] x86/unwind/orc: fix the check of stack addresses

From: Muchun Song
Date: Fri Aug 13 2021 - 06:38:17 EST


In our server, we saw a kernel panic, the call trace is like below.

BUG: stack guard page was hit at 000000001ff76a9e (stack is
0000000095d6f9f7..00000000dd56db03)
kernel stack overflow (page fault): 0000 [#1] SMP NOPTI
RIP: 0010:unwind_next_frame+0x34e/0x570
RSP: 0000:fffffe000221a8f0 EFLAGS: 00010002
RAX: 0000000000000001 RBX: fffffe000221a930 RCX: 0000000000000001
RDX: 0000000000000010 RSI: ffff8a01c3b0adc0 RDI: ffffa6b9f75c7fc8
RBP: 0000000000000004 R08: ffffffff9b200982 R09: ffffffff9bc48718
R10: ffffffff9bc48714 R11: 0000000000000014 R12: ffffffff9bdf54fa
R13: ffffffff9b20098c R14: fffffe0002213ff0 R15: ffffa6b9f75c7fc8
FS: 00007f1ea9f0c700(0000) GS:ffff8a4c4f7c0000(0000)
knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffa6b9f75c8048 CR3: 00000025c7064000 CR4: 0000000000340ee0
Call Trace:
<NMI>
perf_callchain_kernel+0x125/0x140
? interrupt_entry+0xac/0xc3
get_perf_callchain+0x113/0x280
perf_callchain+0x6f/0x80
perf_prepare_sample+0x87/0x510
perf_event_output_forward+0x2a/0x80
? sched_clock+0x5/0x10
? sched_clock_cpu+0xc/0xa0
? arch_perf_update_userpage+0xd0/0xe0
__perf_event_overflow+0x4f/0xf0
perf_ibs_handle_irq+0x37d/0x4e0
? interrupt_entry+0xac/0xc3
? interrupt_entry+0xac/0xc3
? __set_pte_vaddr+0x32/0x50
? __set_pte_vaddr+0x32/0x50
? set_pte_vaddr+0x3c/0x60
? __native_set_fixmap+0x24/0x30
? native_set_fixmap+0x40/0x60
? ghes_copy_tofrom_phys+0x99/0x130
? apei_read+0x90/0xb0
? interrupt_entry+0xac/0xc3
? __ghes_peek_estatus.isra.15+0x51/0xc0
? perf_ibs_nmi_handler+0x34/0x56
? sched_clock+0x5/0x10
perf_ibs_nmi_handler+0x34/0x56
nmi_handle+0x70/0x170
default_do_nmi+0x4e/0x100
do_nmi+0x156/0x1a0
end_repeat_nmi+0x16/0x50

The register of CR2 is the fault address, where is 0xffffa6b9f75c8048.
And the stack range of the current task is [0xffffa6b9f75c4000,
0xffffa6b9f75c7fff]. We can see that 0xffffa6b9f75c8048 goes beyond
the range. So we saw kernel panic.

perf_callchain_kernel
unwind_next_frame
deref_stack_regs(state, addr, ip, sp)
{
struct pt_regs *regs = (struct pt_regs *)addr;

if (!stack_access_ok(state, addr, sizeof(struct pt_regs)))
return false;

*ip = READ_ONCE_NOCHECK(regs->ip); // regs->ip trigger panic
}

We can see the value of @state through crash tool, and the @addr and
@regs are 0xffffa6b9f75c7fc8.

crash> struct unwind_state.stack_info fffffe000221a930 -x
stack_info = {
type = STACK_TYPE_TASK,
begin = 0xffffa6b9f75c4000,
end = 0xffffa6b9f75c8000,
next_sp = 0x0
}

The aim of stack_access_ok() is to check the range if it is valid, where
the range is [0xffffa6b9f75c7fc8, 0xffffa6b9f75c7fc8 + sizeof(struct
pt_regs)]. The size of 'struct pt_regs' is 168, so the range goes beyond
the stack range. However, it passes its check. The reason is that
get_stack_info() only checks the @addr whether it is in the valid stack
range but ignores the 'len'. We need to recheck the range if it is valid
after get_stack_info returns to fix this issue.

Fixes: ee9f8fce9964 ("x86/unwind: Add the ORC unwinder")
Signed-off-by: Muchun Song <songmuchun@xxxxxxxxxxxxx>
---
arch/x86/kernel/unwind_orc.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/unwind_orc.c b/arch/x86/kernel/unwind_orc.c
index 187a86e0e753..54c3037d2687 100644
--- a/arch/x86/kernel/unwind_orc.c
+++ b/arch/x86/kernel/unwind_orc.c
@@ -336,11 +336,13 @@ static bool stack_access_ok(struct unwind_state *state, unsigned long _addr,
struct stack_info *info = &state->stack_info;
void *addr = (void *)_addr;

- if (!on_stack(info, addr, len) &&
- (get_stack_info(addr, state->task, info, &state->stack_mask)))
+ if (on_stack(info, addr, len))
+ return true;
+
+ if (get_stack_info(addr, state->task, info, &state->stack_mask))
return false;

- return true;
+ return on_stack(info, addr, len);
}

static bool deref_stack_reg(struct unwind_state *state, unsigned long addr,
--
2.11.0