Re: [PATCH 5.10] kprobes/x86: Fix kprobe debug exception handling logic

From: Greg KH
Date: Fri Jun 30 2023 - 01:21:36 EST


On Fri, Jun 30, 2023 at 10:08:45AM +0800, Li Huafei wrote:
> We get the following crash caused by a null pointer access:
>
> BUG: kernel NULL pointer dereference, address: 0000000000000000
> ...
> RIP: 0010:resume_execution+0x35/0x190
> ...
> Call Trace:
> <#DB>
> kprobe_debug_handler+0x41/0xd0
> exc_debug+0xe5/0x1b0
> asm_exc_debug+0x19/0x30
> RIP: 0010:copy_from_kernel_nofault.part.0+0x55/0xc0
> ...
> </#DB>
> process_fetch_insn+0xfb/0x720
> kprobe_trace_func+0x199/0x2c0
> ? kernel_clone+0x5/0x2f0
> kprobe_dispatcher+0x3d/0x60
> aggr_pre_handler+0x40/0x80
> ? kernel_clone+0x1/0x2f0
> kprobe_ftrace_handler+0x82/0xf0
> ? __se_sys_clone+0x65/0x90
> ftrace_ops_assist_func+0x86/0x110
> ? rcu_nocb_try_bypass+0x1f3/0x370
> 0xffffffffc07e60c8
> ? kernel_clone+0x1/0x2f0
> kernel_clone+0x5/0x2f0
>
> The analysis reveals that kprobe and hardware breakpoints conflict in
> the use of debug exceptions.
>
> If we set a hardware breakpoint on a memory address and also have a
> kprobe event to fetch the memory at this address. Then when kprobe
> triggers, it goes to read the memory and triggers hardware breakpoint
> monitoring. This time, since kprobe handles debug exceptions earlier
> than hardware breakpoints, it will cause kprobe to incorrectly assume
> that the exception is a kprobe trigger.
>
> Notice that after the mainline commit 6256e668b7af ("x86/kprobes: Use
> int3 instead of debug trap for single-step"), kprobe no longer uses
> debug trap, avoiding the conflict with hardware breakpoints here. This
> commit is to remove the IRET that returns to kernel, not to fix the
> problem we have here. Also there are a bunch of merge conflicts when
> trying to apply this commit to older kernels, so fixing it directly in
> older kernels is probably a better option.

What is the list of commits that it would take to resolve this in these
kernels? We would almost always prefer to do that instead of taking
changes that are not upstream.

thanks,

greg k-h