Re: [RFC PATCH v1 1/1] tracing/kprobe: Add multi-probe support for 'perf_kprobe' PMU

From: Google
Date: Tue Aug 22 2023 - 20:37:16 EST


Hi,

On Mon, 21 Aug 2023 14:55:14 +0200
Francis Laniel <flaniel@xxxxxxxxxxxxxxxxxxx> wrote:

> > Could you tell me how do you use this feature, for what perpose?
>
> Sure (I think I detailed this in the cover letter but I only sent it to the
> "main" mailing list and not the tracing one, sorry for this inconvenience)!
>
> Basically, I was adding NTFS tracing to an existing tool which monitors slow
> I/Os using BPF [1].
> To test the tool, I compiled a kernel with both NTFS module built-in and
> figured out the write operations when done on ntfs3 were missing from the
> output of the tool.
> The problem comes from the library I use in the tool which does not handle
> well when it exists different symbols with the same name.
> Contrary to perf, which only handles kprobes through sysfs, the library
> handles it in both way (sysfs and PMU) with a preference for PMU when
> available [2].
>
> After some discussion, I thought there could be a way to handle this
> automatically in the kernel when using PMU kprobes, hence this patch.
> I totally understand the case I described above is really a corner one, but I
> thought this feature could be useful for other people.
> In the case of the library itself, we could indeed find the address in /proc/
> kallsyms but it would mean having CAP_SYS_ADMIN which is not forcefully
> something we want to enforce.
> Also, if we need to read /boot/vmlinuz or /boot/System.map we also need to be
> root as these files often belong to root and cannot be read by others.
> So, this patch solves the above problem while not needing specific capabilities
> as the kernel will solve it for us.

Thanks for the explanation. I got the background, and still have some questions.

- Is the analysis tool really necessary to be used by users other than
CAP_SYS_ADMIN? Even if it is useful, I still doubt CAP_PERFMON is safer
than CAP_SYS_ADMIN, because BPF program can access any kernel register.

- My concern about this solution (enabling kprobe PMU on all symbols which
have the same name) makes it hard to run the same BPF program on it.
This is because symbols with the same name do not necessarily have the
same arguments (theoretically). Also, the BPF will trace unwanted symbols
at unwanted timing.

- Can you expand that library to handle the same name symbols differently?
I think this should be done in the user space, or in the kallsyms like
storing symbols with source line information.

I understand this demand, but solving that with probing *all* symbols seems
like a brute force solution and may cause another problem later.

But this is a good discussion item. Last month Alessandro sent a script which
makes such symbols unique. Current problem is that the numbering is not enough
to identify which one is from which source code.

https://lore.kernel.org/all/20230714150326.1152359-1-alessandro.carminati@xxxxxxxxx/

>
> > If you just need to trace/profile a specific function which has the same
> > name symbols, you might be better to use `perf probe` +
> > `/sys/kernel/tracing` or `perf record -e EVENT`.
> >
> > Or if you need to run it with CAP_PERFMON, without CAP_SYS_ADMIN,
> > we need to change a userspace tool to find the correct address and
> > pass it to the perf_event_open().
> >
> > > > > Added new events:
> > > > > probe:ntfs_file_write_iter (on ntfs_file_write_iter)
> > > > > probe:ntfs_file_write_iter (on ntfs_file_write_iter)
> > > > >
> > > > > You can now use it in all perf tools, such as:
> > > > > perf record -e probe:ntfs_file_write_iter -aR sleep 1
> > > > >
> > > > > root@vm-amd64:~# cat /sys/kernel/tracing/kprobe_events
> > > > > p:probe/ntfs_file_write_iter _text+5088544
> > > > > p:probe/ntfs_file_write_iter _text+5278560
> > > > >
> > > > > > Thought?
> > > > >
> > > > > This contribution is basically here to sort of mimic what perf does
> > > > > but
> > > > > with PMU kprobes, as this is not possible to write in a sysfs file
> > > > > with
> > > > > this type of probe.
> > > >
> > > > OK, I see it is for BPF only. Maybe BPF program can filter correct one
> > > > to access the argument etc.
> > >
> > > I am not sure I understand, can you please precise?
> > > The eBPF program will be run when the kprobe will be triggered, so if the
> > > kprobe is armed for the function (e.g. old ntfs_file_write_iter()), the
> > > eBPF program will never be called.
> >
> > As I said above, it is userspace BPF loader issue, because it has to specify
> > the correct address via unique symbol + offset, instead of attaching all of
> > them. I think that will be more side-effects.
> >
> > But anyway, thanks for pointing this issue. I should fix kprobe event to
> > reject the symbols which is not unique. That should be pointed by other
> > unique symbols.
>
> You are welcome and I thank you for the discussion.
> Can you please precise more what you think about "reject the symbols which is
> not unique"?
> Basically something like this:

Yes, that's what I said.

> struct trace_event_call *
> create_local_trace_kprobe(char *func, void *addr, unsigned long offs,
> bool is_return)
> {
> ...
> if (!addr && func) {

if (func) { /* because anyway if user specify "func" we have to solve
the symbol address */

> array.addrs = NULL;
> array.size = 0;
> ret = kallsyms_on_each_match_symbol(add_addr, func, &array);
> if (ret)
> goto error_free;
>
> if (array.size != 1) {
> /*
> * Function name corresponding to several symbols must
> * be passed by address only.
> */
> return -EINVAL;

This case may return a unique error code so that the caller can notice
the problem.

Thank you,

> }
> }



> ...
> }
> ?
> If my understanding is correct, I think I can write a patch to achieve this.
>

>
> Best regards.
> ---
> [1]: https://github.com/inspektor-gadget/inspektor-gadget/pull/1879
> [2]: https://github.com/cilium/ebpf/blob/
> 270c859894bd38cdd0c7783317b16343409e4df8/link/kprobe.go#L165-L191
>
>


--
Masami Hiramatsu (Google) <mhiramat@xxxxxxxxxx>