Re: Dumping a struct to a buffer in an strace like style using BTF

From: Alan Maguire
Date: Thu Aug 17 2023 - 04:58:36 EST


hi Arnaldo

On 17/08/2023 01:28, Arnaldo Carvalho de Melo wrote:
> Hi Alan,
>
> Something I planned to do since forever is to get the contents
> of syscall args and print in 'perf trace' using BTF, right now we have
> things like:
>
> [root@quaco ~]# perf trace -e connect* ssh localhost
> 0.000 ( 0.342 ms): ssh/438068 connect(fd: 3, uservaddr: { .family: INET, port: 22, addr: 127.0.0.1 }, addrlen: 16) = 0
> root@localhost's password:
>
> in perf-tools-next when building with BUILD_BPF_SKEL=1 that will hook
> into that specific syscall and collect the uservaddr sockaddr struct and
> then pretty print it.
>
> That is done manually (the last leg) in
> tools/perf/trace/beauty/sockaddr.c:
>
> syscall_arg__scnprintf_augmented_sockaddr
> af_scnprintfs[family](syscall pointer contents collected via BPF)
>
> which leads to struct sockaddr_in or sockaddr_in6 specific pretty
> printers, I wanted to do what these two struct specific pretty printers
> do but using BTF.
>
> I guess this is already available, but from a _really_ quick look at
> libbpf I couldn't find it, ideas?
>

This would be great! If you take a look in btf_dump.c, there's

int btf_dump__dump_type_data(struct btf_dump *d, __u32 id,
const void *data, size_t data_sz,
const struct btf_dump_type_data_opts *opts)

This will dump a typed representation of the data, presuming it is of
the BTF type specified by id. You get output like

(struct net){
.passive = (refcount_t){
.refs = (atomic_t){
.counter = (int)2, },
...

You need to call

struct btf_dump *btf_dump__new(const struct btf *btf,
btf_dump_printf_fn_t printf_fn,
void *ctx,
const struct btf_dump_opts *opts)


...first to get a struct btf_dump *; as you can see above you supply
your own print function. There are options to control indentation (tab
versus spaces), compactness, etc. If there's something else you need
from the perf side let me know and we can try and add it to libbpf.

I coded up a proof-of-concept example using this stuff to dump kernel
function arguments; it's called ksnoop and is in bcc:

https://github.com/iovisor/bcc/blob/master/libbpf-tools/ksnoop.bpf.c
https://github.com/iovisor/bcc/blob/master/libbpf-tools/ksnoop.c

...so that will probably help with the details. You probably want a
similar approach; something like

- foreach syscall
- populate BPF map with vmlinux BTF ids of args/return types,
and associated sizes of data to store + whether it is a
pointer (since in that case we need to copy memory at pointer
address)

Then the bpf program can use that info to copy the right amount of
memory to the associated buffer and dump it to userspace for display.

This would allow you to have a generic augmented raw syscall BPF
program; it would just need a way to look up the appropriate map entry
describing its arguments etc. ksnoop does this by storing the map
entries by function address, and in kprobe context it then looks up the
instruction pointer to get the right map entry.

There's more info at

https://blogs.oracle.com/linux/post/kernel-data-centric-tracing

Hope this helps,


Alan

> I want to try the code at the end of this message for another
> multiplexer syscall, bpf(), with this on top of what is at:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git perf-tools-next
>
> Best regards,
>
> - Arnaldo
>
> diff --git a/tools/perf/util/bpf_skel/augmented_raw_syscalls.bpf.c b/tools/perf/util/bpf_skel/augmented_raw_syscalls.bpf.c
> index 9c1d0b271b20f693..79767422efe9479c 100644
> --- a/tools/perf/util/bpf_skel/augmented_raw_syscalls.bpf.c
> +++ b/tools/perf/util/bpf_skel/augmented_raw_syscalls.bpf.c
> @@ -319,6 +319,27 @@ int sys_enter_perf_event_open(struct syscall_enter_args *args)
> return 1; /* Failure: don't filter */
> }
>
> +SEC("tp/syscalls/sys_enter_bpf")
> +int sys_enter_bpf(struct syscall_enter_args *args)
> +{
> + struct augmented_args_payload *augmented_args = augmented_args_payload();
> + const void *attr = (void *)args->args[1];
> + unsigned int size = args->args[2];
> + unsigned int len = sizeof(augmented_args->args);
> +
> + if (augmented_args == NULL)
> + goto failure;
> +
> + size &= sizeof(augmented_args->__data) - 1;
> +
> + if (bpf_probe_read(&augmented_args->__data, size, attr) < 0)
> + goto failure;
> +
> + return augmented__output(args, augmented_args, len + size);
> +failure:
> + return 1; /* Failure: don't filter */
> +}
> +
> SEC("tp/syscalls/sys_enter_clock_nanosleep")
> int sys_enter_clock_nanosleep(struct syscall_enter_args *args)
> {