Re: [PATCH v6 00/36] tracing: fprobe: function_graph: Multi-function graph and fprobe on fgraph

From: Jiri Olsa
Date: Sat Jan 27 2024 - 13:44:23 EST


On Sat, Jan 27, 2024 at 12:14:05AM +0900, Masami Hiramatsu wrote:
> On Thu, 25 Jan 2024 15:54:53 +0100
> Jiri Olsa <olsajiri@xxxxxxxxx> wrote:
>
> > On Fri, Jan 12, 2024 at 07:10:50PM +0900, Masami Hiramatsu (Google) wrote:
> > > Hi,
> > >
> > > Here is the 6th version of the series to re-implement the fprobe on
> > > function-graph tracer. The previous version is;
> > >
> > > https://lore.kernel.org/all/170290509018.220107.1347127510564358608.stgit@devnote2/
> > >
> > > This version fixes use-after-unregister bug and arm64 stack unwinding
> > > bug [13/36], add an improvement for multiple interrupts during push
> > > operation[20/36], keep SAVE_REGS until BPF and fprobe_event using
> > > ftrace_regs[26/36], also reorder the patches[30/36][31/36] so that new
> > > fprobe can switch to SAVE_ARGS[32/36] safely.
> > > This series also temporarily adds a DIRECT_CALLS bugfix[1/36], which
> > > should be pushed separatedly as a stable bugfix.
> > >
> > > There are some TODOs:
> > > - Add s390x and loongarch support to fprobe (multiple fgraph).
> > > - Fix to get the symbol address from ftrace entry address on arm64.
> > > (This should be done in BPF trace event)
> > > - Cleanup code, rename some terms(offset/index) and FGRAPH_TYPE_BITMAP
> > > part should be merged to FGRAPH_TYPE_ARRAY patch.
> >
> > hi,
> > I'm getting kasan bugs below when running bpf selftests on top of this
> > patchset.. I think it's probably the reason I see failures in some bpf
> > kprobe_multi/fprobe tests
> >
> > so far I couldn't find the reason.. still checking ;-)
>
> Thanks for reporting! Have you built the kernel with debuginfo? In that
> case, can you decode the line from the address?
>
> $ eu-addr2line -fi -e vmlinux ftrace_push_return_trace.isra.0+0x346
>
> This helps me a lot.

I had to recompile/regenerate the fault, it points in here:

ffffffff8149b390 <ftrace_push_return_trace.isra.0>:
...

current->ret_stack[rindex - 1] = val;
ffffffff8149b6b1: 48 8d bd 78 28 00 00 lea 0x2878(%rbp),%rdi
ffffffff8149b6b8: e8 63 e4 28 00 call ffffffff81729b20 <__asan_load8>
ffffffff8149b6bd: 48 8b 95 78 28 00 00 mov 0x2878(%rbp),%rdx
ffffffff8149b6c4: 41 8d 47 ff lea -0x1(%r15),%eax
ffffffff8149b6c8: 48 98 cltq
ffffffff8149b6ca: 4c 8d 24 c2 lea (%rdx,%rax,8),%r12
ffffffff8149b6ce: 4c 89 e7 mov %r12,%rdi
ffffffff8149b6d1: e8 ea e4 28 00 call ffffffff81729bc0 <__asan_store8>
---> ffffffff8149b6d6: 49 89 1c 24 mov %rbx,(%r12)
current->curr_ret_stack = index = rindex;
ffffffff8149b6da: 48 8d bd 6c 28 00 00 lea 0x286c(%rbp),%rdi
ffffffff8149b6e1: e8 9a e3 28 00 call ffffffff81729a80 <__asan_store4>
ffffffff8149b6e6: 44 89 bd 6c 28 00 00 mov %r15d,0x286c(%rbp)
ffffffff8149b6ed: e9 8d fd ff ff jmp ffffffff8149b47f <ftrace_push_return_trace.isra.0+0xef>
if (WARN_ON_ONCE(idx <= 0))


the dump is attached below (same address as in previous email)

jirka


---
[ 360.152200][ C3] BUG: KASAN: slab-out-of-bounds in ftrace_push_return_trace.isra.0+0x346/0x370
[ 360.153195][ C3] Write of size 8 at addr ffff8881a0e10ff8 by task kworker/3:4/728
[ 360.154101][ C3]
[ 360.154414][ C3] CPU: 3 PID: 728 Comm: kworker/3:4 Tainted: G OE 6.7.0+ #316 c9b0d53b3491b547d06b6b50629b74711600ddc9
[ 360.155679][ C3] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-1.fc38 04/01/2014
[ 360.156611][ C3] Workqueue: events free_obj_work
[ 360.157175][ C3] Call Trace:
[ 360.157561][ C3] <IRQ>
[ 360.157904][ C3] dump_stack_lvl+0xf6/0x180
[ 360.158404][ C3] print_report+0xc4/0x610
[ 360.158853][ C3] ? lock_release+0xba/0x760
[ 360.159375][ C3] ? __phys_addr+0x5/0x80
[ 360.159872][ C3] ? __phys_addr+0x33/0x80
[ 360.161309][ C3] kasan_report+0xbe/0xf0
[ 360.161940][ C3] ? ftrace_push_return_trace.isra.0+0x346/0x370
[ 360.162817][ C3] ? ftrace_push_return_trace.isra.0+0x346/0x370
[ 360.163518][ C3] ? __pfx_kernel_text_address+0x10/0x10
[ 360.164152][ C3] ? __kernel_text_address+0xe/0x40
[ 360.164715][ C3] ftrace_push_return_trace.isra.0+0x346/0x370
[ 360.165324][ C3] ? __pfx_kernel_text_address+0x10/0x10
[ 360.165940][ C3] function_graph_enter_ops+0xbb/0x2d0
[ 360.166555][ C3] ? __kernel_text_address+0xe/0x40
[ 360.167134][ C3] ? __pfx_function_graph_enter_ops+0x10/0x10
[ 360.167801][ C3] ? __pfx_function_graph_enter_ops+0x10/0x10
[ 360.168454][ C3] ? __pfx___kernel_text_address+0x10/0x10
[ 360.169086][ C3] ? __pfx_unwind_get_return_address+0x10/0x10
[ 360.169781][ C3] ftrace_graph_func+0x142/0x270
[ 360.170341][ C3] ? __pfx_kernel_text_address+0x10/0x10
[ 360.170960][ C3] ? orc_find.part.0+0x5/0x250
[ 360.171514][ C3] 0xffffffffa0568097
[ 360.171990][ C3] ? 0xffffffffa0568097
[ 360.172463][ C3] ? preempt_count_sub+0x5/0xc0
[ 360.172948][ C3] ? unwind_get_return_address+0x2a/0x50
[ 360.173512][ C3] ? unwind_get_return_address+0xf/0x50
[ 360.174068][ C3] ? orc_find.part.0+0x5/0x250
[ 360.174535][ C3] ? kernel_text_address+0x5/0x130
[ 360.175094][ C3] ? __kernel_text_address+0x5/0x40
[ 360.175623][ C3] kernel_text_address+0x5/0x130
[ 360.176118][ C3] __kernel_text_address+0xe/0x40
[ 360.176625][ C3] unwind_get_return_address+0x33/0x50
[ 360.177160][ C3] ? __pfx_stack_trace_consume_entry+0x10/0x10
[ 360.177805][ C3] arch_stack_walk+0x9e/0xf0
[ 360.178319][ C3] ? orc_find.part.0+0x5/0x250
[ 360.178821][ C3] ? rcu_do_batch+0x396/0xb10
[ 360.179305][ C3] stack_trace_save+0x91/0xd0
[ 360.179807][ C3] ? __pfx_stack_trace_save+0x10/0x10
[ 360.180389][ C3] ? stack_trace_save+0x5/0xd0
[ 360.180908][ C3] kasan_save_stack+0x1c/0x40
[ 360.181419][ C3] ? kasan_save_stack+0x1c/0x40
[ 360.181948][ C3] ? kasan_save_track+0x10/0x30
[ 360.182531][ C3] ? kasan_save_free_info+0x3b/0x60
[ 360.183157][ C3] ? __kasan_slab_free+0x122/0x1c0
[ 360.183713][ C3] ? kmem_cache_free+0x19d/0x460
[ 360.184241][ C3] ? rcu_do_batch+0x396/0xb10
[ 360.184781][ C3] ? rcu_core+0x3b2/0x5f0
[ 360.185289][ C3] ? __do_softirq+0x13b/0x64d
[ 360.185814][ C3] ? __irq_exit_rcu+0xe4/0x190
[ 360.186366][ C3] ? irq_exit_rcu+0xa/0x30
[ 360.186869][ C3] ? sysvec_call_function+0x8f/0xb0
[ 360.187479][ C3] ? asm_sysvec_call_function+0x16/0x20
[ 360.188098][ C3] ? ftrace_push_return_trace.isra.0+0x12c/0x370
[ 360.188784][ C3] ? function_graph_enter_ops+0xbb/0x2d0
[ 360.189413][ C3] ? ftrace_graph_func+0x142/0x270
[ 360.189985][ C3] ? 0xffffffffa0568097
[ 360.191352][ C3] ? function_graph_enter_ops+0x2b9/0x2d0
[ 360.192148][ C3] ? __kasan_slab_free+0x3b/0x1c0
[ 360.192852][ C3] ? __pfx_function_graph_enter_ops+0x10/0x10
[ 360.193545][ C3] ? __pfx___phys_addr+0x10/0x10
[ 360.194115][ C3] ? lockdep_hardirqs_on_prepare+0xe/0x250
[ 360.194778][ C3] ? preempt_count_sub+0x5/0xc0
[ 360.195373][ C3] ? ftrace_graph_func+0x173/0x270
[ 360.195937][ C3] ? __pfx___phys_addr+0x10/0x10
[ 360.196461][ C3] ? rcu_do_batch+0x396/0xb10
[ 360.196972][ C3] ? 0xffffffffa0568097
[ 360.197481][ C3] ? __pfx___debug_check_no_obj_freed+0x10/0x10
[ 360.198175][ C3] ? 0xffffffffa0568097
[ 360.198675][ C3] ? lock_acquire+0xc6/0x490
[ 360.199247][ C3] kasan_save_track+0x10/0x30
[ 360.199803][ C3] kasan_save_free_info+0x3b/0x60
[ 360.200474][ C3] __kasan_slab_free+0x122/0x1c0
[ 360.201078][ C3] ? rcu_do_batch+0x396/0xb10
[ 360.201567][ C3] kmem_cache_free+0x19d/0x460
[ 360.202093][ C3] ? __pfx_free_object_rcu+0x10/0x10
[ 360.202658][ C3] rcu_do_batch+0x396/0xb10
[ 360.203193][ C3] ? __pfx_rcu_do_batch+0x10/0x10
[ 360.203765][ C3] ? sched_clock+0xc/0x30
[ 360.204250][ C3] ? rcu_is_watching+0x34/0x60
[ 360.204758][ C3] rcu_core+0x3b2/0x5f0
[ 360.205245][ C3] ? rcu_is_watching+0x34/0x60
[ 360.205745][ C3] __do_softirq+0x13b/0x64d
[ 360.206260][ C3] __irq_exit_rcu+0xe4/0x190
[ 360.206802][ C3] irq_exit_rcu+0xa/0x30
[ 360.207334][ C3] sysvec_call_function+0x8f/0xb0
[ 360.207929][ C3] </IRQ>
[ 360.208330][ C3] <TASK>
[ 360.208687][ C3] asm_sysvec_call_function+0x16/0x20