Re: [PATCH 1/1] tools/dtrace: initial implementation of DTrace

From: Kris Van Hees
Date: Mon Jul 08 2019 - 18:40:12 EST


On Mon, Jul 08, 2019 at 02:15:37PM -0300, Arnaldo Carvalho de Melo wrote:
> Em Wed, Jul 03, 2019 at 08:14:30PM -0700, Kris Van Hees escreveu:
> > This initial implementation of a tiny subset of DTrace functionality
> > provides the following options:
> >
> > dtrace [-lvV] [-b bufsz] -s script
> > -b set trace buffer size
> > -l list probes (only works with '-s script' for now)
> > -s enable or list probes for the specified BPF program
> > -V report DTrace API version
> >
> > The patch comprises quite a bit of code due to DTrace requiring a few
> > crucial components, even in its most basic form.
> >
> > The code is structured around the command line interface implemented in
> > dtrace.c. It provides option parsing and drives the three modes of
> > operation that are currently implemented:
> >
> > 1. Report DTrace API version information.
> > Report the version information and terminate.
> >
> > 2. List probes in BPF programs.
> > Initialize the list of probes that DTrace recognizes, load BPF
> > programs, parse all BPF ELF section names, resolve them into
> > known probes, and emit the probe names. Then terminate.
> >
> > 3. Load BPF programs and collect tracing data.
> > Initialize the list of probes that DTrace recognizes, load BPF
> > programs and attach them to their corresponding probes, set up
> > perf event output buffers, and start processing tracing data.
> >
> > This implementation makes extensive use of BPF (handled by dt_bpf.c) and
> > the perf event output ring buffer (handled by dt_buffer.c). DTrace-style
> > probe handling (dt_probe.c) offers an interface to probes that hides the
> > implementation details of the individual probe types by provider (dt_fbt.c
> > and dt_syscall.c). Probe lookup by name uses a hashtable implementation
> > (dt_hash.c). The dt_utils.c code populates a list of online CPU ids, so
> > we know what CPUs we can obtain tracing data from.
> >
> > Building the tool is trivial because its only dependency (libbpf) is in
> > the kernel tree under tools/lib/bpf. A simple 'make' in the tools/dtrace
> > directory suffices.
> >
> > The 'dtrace' executable needs to run as root because BPF programs cannot
> > be loaded by non-root users.
> >
> > Signed-off-by: Kris Van Hees <kris.van.hees@xxxxxxxxxx>
> > Reviewed-by: David Mc Lean <david.mclean@xxxxxxxxxx>
> > Reviewed-by: Eugene Loh <eugene.loh@xxxxxxxxxx>
> > ---
> > MAINTAINERS | 6 +
> > tools/dtrace/Makefile | 88 ++++++++++
> > tools/dtrace/bpf_sample.c | 145 ++++++++++++++++
> > tools/dtrace/dt_bpf.c | 188 +++++++++++++++++++++
> > tools/dtrace/dt_buffer.c | 331 +++++++++++++++++++++++++++++++++++++
> > tools/dtrace/dt_fbt.c | 201 ++++++++++++++++++++++
> > tools/dtrace/dt_hash.c | 211 +++++++++++++++++++++++
> > tools/dtrace/dt_probe.c | 230 ++++++++++++++++++++++++++
> > tools/dtrace/dt_syscall.c | 179 ++++++++++++++++++++
> > tools/dtrace/dt_utils.c | 132 +++++++++++++++
> > tools/dtrace/dtrace.c | 249 ++++++++++++++++++++++++++++
> > tools/dtrace/dtrace.h | 13 ++
> > tools/dtrace/dtrace_impl.h | 101 +++++++++++
> > 13 files changed, 2074 insertions(+)
> > create mode 100644 tools/dtrace/Makefile
> > create mode 100644 tools/dtrace/bpf_sample.c
> > create mode 100644 tools/dtrace/dt_bpf.c
> > create mode 100644 tools/dtrace/dt_buffer.c
> > create mode 100644 tools/dtrace/dt_fbt.c
> > create mode 100644 tools/dtrace/dt_hash.c
> > create mode 100644 tools/dtrace/dt_probe.c
> > create mode 100644 tools/dtrace/dt_syscall.c
> > create mode 100644 tools/dtrace/dt_utils.c
> > create mode 100644 tools/dtrace/dtrace.c
> > create mode 100644 tools/dtrace/dtrace.h
> > create mode 100644 tools/dtrace/dtrace_impl.h
> >
> > diff --git a/MAINTAINERS b/MAINTAINERS
> > index 606d1f80bc49..668468834865 100644
> > --- a/MAINTAINERS
> > +++ b/MAINTAINERS
> > @@ -5474,6 +5474,12 @@ W: https://linuxtv.org
> > S: Odd Fixes
> > F: drivers/media/pci/dt3155/
> >
> > +DTRACE
> > +M: Kris Van Hees <kris.van.hees@xxxxxxxxxx>
> > +L: dtrace-devel@xxxxxxxxxxxxxx
> > +S: Maintained
> > +F: tools/dtrace/
> > +
> > DVB_USB_AF9015 MEDIA DRIVER
> > M: Antti Palosaari <crope@xxxxxx>
> > L: linux-media@xxxxxxxxxxxxxxx
> > diff --git a/tools/dtrace/Makefile b/tools/dtrace/Makefile
> > new file mode 100644
> > index 000000000000..99fd0f9dd1d6
> > --- /dev/null
> > +++ b/tools/dtrace/Makefile
> > @@ -0,0 +1,88 @@
> > +# SPDX-License-Identifier: GPL-2.0
> > +#
> > +# This Makefile is based on samples/bpf.
> > +#
> > +# Copyright (c) 2019, Oracle and/or its affiliates. All rights reserved.
> > +
> > +DT_VERSION := 2.0.0
> > +DT_GIT_VERSION := $(shell git rev-parse HEAD 2>/dev/null || \
> > + echo Unknown)
> > +
> > +DTRACE_PATH ?= $(abspath $(srctree)/$(src))
> > +TOOLS_PATH := $(DTRACE_PATH)/..
> > +SAMPLES_PATH := $(DTRACE_PATH)/../../samples
> > +
> > +hostprogs-y := dtrace
> > +
> > +LIBBPF := $(TOOLS_PATH)/lib/bpf/libbpf.a
> > +OBJS := dt_bpf.o dt_buffer.o dt_utils.o dt_probe.o \
> > + dt_hash.o \
> > + dt_fbt.o dt_syscall.o
> > +
> > +dtrace-objs := $(OBJS) dtrace.o
> > +
> > +always := $(hostprogs-y)
> > +always += bpf_sample.o
> > +
> > +KBUILD_HOSTCFLAGS += -DDT_VERSION=\"$(DT_VERSION)\"
> > +KBUILD_HOSTCFLAGS += -DDT_GIT_VERSION=\"$(DT_GIT_VERSION)\"
> > +KBUILD_HOSTCFLAGS += -I$(srctree)/tools/lib
> > +KBUILD_HOSTCFLAGS += -I$(srctree)/tools/perf
>
> Interesting, what are you using from tools/perf/? So that we can move to
> tools/{include,lib,arch}.

This is my mistake... an earlier version of the code (as I was developing it)
was using stuff from tools/perf, but that is no longer the case. Removing it.

> > +KBUILD_HOSTCFLAGS += -I$(srctree)/tools/include/uapi
> > +KBUILD_HOSTCFLAGS += -I$(srctree)/tools/include/
> > +KBUILD_HOSTCFLAGS += -I$(srctree)/usr/include
> > +
> > +KBUILD_HOSTLDLIBS := $(LIBBPF) -lelf
> > +
> > +LLC ?= llc
> > +CLANG ?= clang
> > +LLVM_OBJCOPY ?= llvm-objcopy
> > +
> > +ifdef CROSS_COMPILE
> > +HOSTCC = $(CROSS_COMPILE)gcc
> > +CLANG_ARCH_ARGS = -target $(ARCH)
> > +endif
> > +
> > +all:
> > + $(MAKE) -C ../../ $(CURDIR)/ DTRACE_PATH=$(CURDIR)
> > +
> > +clean:
> > + $(MAKE) -C ../../ M=$(CURDIR) clean
> > + @rm -f *~
> > +
> > +$(LIBBPF): FORCE
> > + $(MAKE) -C $(dir $@) RM='rm -rf' LDFLAGS= srctree=$(DTRACE_PATH)/../../ O=
> > +
> > +FORCE:
> > +
> > +.PHONY: verify_cmds verify_target_bpf $(CLANG) $(LLC)
> > +
> > +verify_cmds: $(CLANG) $(LLC)
> > + @for TOOL in $^ ; do \
> > + if ! (which -- "$${TOOL}" > /dev/null 2>&1); then \
> > + echo "*** ERROR: Cannot find LLVM tool $${TOOL}" ;\
> > + exit 1; \
> > + else true; fi; \
> > + done
> > +
> > +verify_target_bpf: verify_cmds
> > + @if ! (${LLC} -march=bpf -mattr=help > /dev/null 2>&1); then \
> > + echo "*** ERROR: LLVM (${LLC}) does not support 'bpf' target" ;\
> > + echo " NOTICE: LLVM version >= 3.7.1 required" ;\
> > + exit 2; \
> > + else true; fi
> > +
> > +$(DTRACE_PATH)/*.c: verify_target_bpf $(LIBBPF)
> > +$(src)/*.c: verify_target_bpf $(LIBBPF)
> > +
> > +$(obj)/%.o: $(src)/%.c
> > + @echo " CLANG-bpf " $@
> > + $(Q)$(CLANG) $(NOSTDINC_FLAGS) $(LINUXINCLUDE) $(EXTRA_CFLAGS) -I$(obj) \
> > + -I$(srctree)/tools/testing/selftests/bpf/ \
> > + -D__KERNEL__ -D__BPF_TRACING__ -Wno-unused-value -Wno-pointer-sign \
> > + -D__TARGET_ARCH_$(ARCH) -Wno-compare-distinct-pointer-types \
> > + -Wno-gnu-variable-sized-type-not-at-end \
> > + -Wno-address-of-packed-member -Wno-tautological-compare \
> > + -Wno-unknown-warning-option $(CLANG_ARCH_ARGS) \
> > + -I$(srctree)/samples/bpf/ -include asm_goto_workaround.h \
> > + -O2 -emit-llvm -c $< -o -| $(LLC) -march=bpf $(LLC_FLAGS) -filetype=obj -o $@
>
>
> We have the above in tools/perf/util/llvm-utils.c, perhaps we need to
> move it to some place in lib/ to share?

Yes, if there is a way to put things like this in a central location so we can
maintain a single copy that would be a good idea indeed.

> > diff --git a/tools/dtrace/bpf_sample.c b/tools/dtrace/bpf_sample.c
> > new file mode 100644
> > index 000000000000..49f350390b5f
> > --- /dev/null
> > +++ b/tools/dtrace/bpf_sample.c
> > @@ -0,0 +1,145 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/*
> > + * This sample DTrace BPF tracing program demonstrates how actions can be
> > + * associated with different probe types.
> > + *
> > + * The kprobe/ksys_write probe is a Function Boundary Tracing (FBT) entry probe
> > + * on the ksys_write(fd, buf, count) function in the kernel. Arguments to the
> > + * function can be retrieved from the CPU registers (struct pt_regs).
> > + *
> > + * The tracepoint/syscalls/sys_enter_write probe is a System Call entry probe
> > + * for the write(d, buf, count) system call. Arguments to the system call can
> > + * be retrieved from the tracepoint data passed to the BPF program as context
> > + * struct syscall_data) when the probe fires.
> > + *
> > + * The BPF program associated with each probe prepares a DTrace BPF context
> > + * (struct dt_bpf_context) that stores the probe ID and up to 10 arguments.
> > + * Only 3 arguments are used in this sample. Then the prorgams call a shared
> > + * BPF function (bpf_action) that implements the actual action to be taken when
> > + * a probe fires. It prepares a data record to be stored in the tracing buffer
> > + * and submits it to the buffer. The data in the data record is obtained from
> > + * the DTrace BPF context.
> > + *
> > + * Copyright (c) 2019, Oracle and/or its affiliates. All rights reserved.
> > + */
> > +#include <uapi/linux/bpf.h>
> > +#include <linux/ptrace.h>
> > +#include <linux/version.h>
> > +#include <uapi/linux/unistd.h>
> > +#include "bpf_helpers.h"
> > +
> > +#include "dtrace.h"
> > +
> > +struct syscall_data {
> > + struct pt_regs *regs;
> > + long syscall_nr;
> > + long arg[6];
> > +};
> > +
> > +struct bpf_map_def SEC("maps") buffers = {
> > + .type = BPF_MAP_TYPE_PERF_EVENT_ARRAY,
> > + .key_size = sizeof(u32),
> > + .value_size = sizeof(u32),
> > + .max_entries = NR_CPUS,
> > +};
> > +
> > +#if defined(__amd64)
> > +# define GET_REGS_ARG0(regs) ((regs)->di)
> > +# define GET_REGS_ARG1(regs) ((regs)->si)
> > +# define GET_REGS_ARG2(regs) ((regs)->dx)
> > +# define GET_REGS_ARG3(regs) ((regs)->cx)
> > +# define GET_REGS_ARG4(regs) ((regs)->r8)
> > +# define GET_REGS_ARG5(regs) ((regs)->r9)
> > +#else
> > +# warning Argument retrieval from pt_regs is not supported yet on this arch.
> > +# define GET_REGS_ARG0(regs) 0
> > +# define GET_REGS_ARG1(regs) 0
> > +# define GET_REGS_ARG2(regs) 0
> > +# define GET_REGS_ARG3(regs) 0
> > +# define GET_REGS_ARG4(regs) 0
> > +# define GET_REGS_ARG5(regs) 0
> > +#endif
>
> We have this in tools/testing/selftests/bpf/bpf_helpers.h, probably need
> to move to some other place in tools/include/ where this can be shared.

I should be using the ones in bpf_helpers (since I already include that
anyway), and yes, if we can move that to a general use location under
tools/include that would be a good idea.

Also, I jsut updated my code to use this and I added a PT_REGS_PARM6(x) for
all the listed archs because I need to be able to get to up to 6 parameters
rather than the supported 5. As far as I can see, all listed archs support
argument passing of at least 6 arguments so this should be no problem.

Any objections?