Re: [PATCH v7 00/11] tracing: Add fprobe events

From: Jiri Olsa
Date: Thu Apr 27 2023 - 07:59:06 EST


On Thu, Apr 27, 2023 at 10:17:45AM +0900, Masami Hiramatsu (Google) wrote:
> Hi,
>
> Here is the 7th version of improve fprobe and add a basic fprobe event
> support for ftrace (tracefs) and perf. Here is the previous version.
>
> https://lore.kernel.org/all/168234755610.2210510.12133559313738141202.stgit@xxxxxxxxxxxxxxxxxxxxxxxxxxxxx/
>
> This version is rebased on the latest linux-trace/for-next, fixes
> bpf_get_btf_vmlinux() return value check [6/11] and adds new BTF $retval
> type support [9/11] (I forgot to implement this feature last time).
> Also updates according to the BTF $retval type support.
>
> You can also get this series from:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/mhiramat/linux.git topic/fprobe-event-ext
>
> With this fprobe events, we can continue to trace function entry/exit
> even if the CONFIG_KPROBES_ON_FTRACE is not available. Since
> CONFIG_KPROBES_ON_FTRACE requires the CONFIG_DYNAMIC_FTRACE_WITH_REGS,
> it is not available if the architecture only supports
> CONFIG_DYNAMIC_FTRACE_WITH_ARGS (e.g. arm64). And that means kprobe
> events can not probe function entry/exit effectively on such architecture.
> But this problem can be solved if the dynamic events supports fprobe events
> because fprobe events doesn't use kprobe but ftrace via fprobe.
>
> FPROBE EVENTS
> =============
>
> Fprobe events allows user to add new events on the entry and exit of kernel
> functions (which can be ftraced). Unlike kprobe events, the fprobe events
> can only probe the function entry and exit, and it can only trace the
> function args, return value, and stacks. (no registers)
> For probing function body, users can continue to use the kprobe events.
>
> The tracepoint probe events (tprobe events) also allows user to add new
> events dynamically on the tracepoint. Most of the tracepoint already has
> trace-events, so this feature is useful if you only want to know a
> specific parameter, or trace the tracepoints which has no trace-events
> (e.g. sched_*_tp tracepoints only exposes the tracepoints.)
>
> The fprobe events syntax is;
>
> f[:[GRP/][EVENT]] FUNCTION [FETCHARGS]
> f[MAXACTIVE][:[GRP/][EVENT]] FUNCTION%return [FETCHARGS]
>
> And tracepoint probe events syntax is;
>
> t[:[GRP/][EVENT]] TRACEPOINT [FETCHARGS]
>
> This series includes BTF argument support for fprobe/tracepoint events,
> and kprobe events. This allows us to fetch a specific function parameter
> by name, and all parameters by '$$args'.

are you planning to fetch and display more complicated types in future?
like strings or dereferencing struct field from argument pointer

> Note that enabling this feature, you need to enable CONFIG_BPF_SYSCALL and
> confirm that your arch supports CONFIG_HAVE_FUNCTION_ARG_ACCESS_API.
>
> E.g.
>
> # echo 't kfree ptr' >> dynamic_events
> # echo 'f kfree object' >> dynamic_events
> # cat dynamic_events
> t:tracepoints/kfree kfree ptr=ptr
> f:fprobes/kfree__entry kfree object=object
> # echo 1 > events/fprobes/enable
> # echo 1 > events/tracepoints/enable
> # echo > trace
> # head -n 20 trace | tail
> # TASK-PID CPU# ||||| TIMESTAMP FUNCTION
> # | | | ||||| | |
> tail-84 [000] ..... 1324.561958: kfree__entry: (kfree+0x4/0x140) object=0xffff888006383c00
> tail-84 [000] ...1. 1324.561961: kfree: (__probestub_kfree+0x4/0x10) ptr=0xffff888006383c00
> tail-84 [000] ..... 1324.561988: kfree__entry: (kfree+0x4/0x140) object=0x0
> tail-84 [000] ...1. 1324.561988: kfree: (__probestub_kfree+0x4/0x10) ptr=0x0
> tail-84 [000] ..... 1324.561989: kfree__entry: (kfree+0x4/0x140) object=0xffff88800671e600
> tail-84 [000] ...1. 1324.561989: kfree: (__probestub_kfree+0x4/0x10) ptr=0xffff88800671e600
> tail-84 [000] ..... 1324.562368: kfree__entry: (kfree+0x4/0x140) object=0xffff8880065e0580
> tail-84 [000] ...1. 1324.562369: kfree: (__probestub_kfree+0x4/0x10) ptr=0xffff8880065e0580

I checked with perf and record/stat/script seem to work fine with this

# ./perf record -e 'fprobes:myprobe'
^C[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.162 MB perf.data (1 samples) ]

# ./perf script
systemd-oomd 479 [001] 14550.722079: fprobes:myprobe: (ffffffff81505be0) filename=0x557b033662b0

perf trace seems to be off with __probe_ip for some reason:

# ./perf trace -e 'fprobes:myprobe'
0.000 systemd-oomd/479 fprobes:myprobe(__probe_ip: -2125440032, filename: 93986839069680)
1.189 systemd-oomd/479 fprobes:myprobe(__probe_ip: -2125440032, filename: 93986839070144)

but it's probably perf issue

thanks,
jirka