Re: [PATCH v5 bpf-next 2/3] bpf: introduce helper bpf_get_branch_snapshot

From: Andrii Nakryiko
Date: Thu Sep 02 2021 - 19:05:55 EST


On Thu, Sep 2, 2021 at 4:03 PM Song Liu <songliubraving@xxxxxx> wrote:
>
>
>
> > On Sep 2, 2021, at 3:53 PM, Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> wrote:
> >
> > On Thu, Sep 2, 2021 at 9:58 AM Song Liu <songliubraving@xxxxxx> wrote:
> >>
> >> Introduce bpf_get_branch_snapshot(), which allows tracing pogram to get
> >> branch trace from hardware (e.g. Intel LBR). To use the feature, the
> >> user need to create perf_event with proper branch_record filtering
> >> on each cpu, and then calls bpf_get_branch_snapshot in the bpf function.
> >> On Intel CPUs, VLBR event (raw event 0x1b00) can be use for this.
> >>
> >> Signed-off-by: Song Liu <songliubraving@xxxxxx>
> >> ---
> >> include/uapi/linux/bpf.h | 22 ++++++++++++++++++++++
> >> kernel/bpf/trampoline.c | 3 ++-
> >> kernel/trace/bpf_trace.c | 33 +++++++++++++++++++++++++++++++++
> >> tools/include/uapi/linux/bpf.h | 22 ++++++++++++++++++++++
> >> 4 files changed, 79 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> >> index 791f31dd0abee..c986e6fad5bc0 100644
> >> --- a/include/uapi/linux/bpf.h
> >> +++ b/include/uapi/linux/bpf.h
> >> @@ -4877,6 +4877,27 @@ union bpf_attr {
> >> * Get the struct pt_regs associated with **task**.
> >> * Return
> >> * A pointer to struct pt_regs.
> >> + *
> >> + * long bpf_get_branch_snapshot(void *entries, u32 size, u64 flags)
> >> + * Description
> >> + * Get branch trace from hardware engines like Intel LBR. The
> >> + * branch trace is taken soon after the trigger point of the
> >> + * BPF program, so it may contain some entries after the
> >
> > This part is a leftover from previous design, so not relevant anymore?
>
> Hmm.. This is still relevant, but not very accurate. I guess we should
> provide more information, like "For more information about branches before
> the trigger point, this should be called early in the BPF program".

I read the part about "taken soon after the trigger point of BPF
program" as a reference to previous implementation. So maybe let's
clarify that because LBR is not frozen, from the time
bpf_get_branch_snapshot() is called to when we actually capture LBRs
we can waste few records due to internal kernel calls, so the user has
to be aware of that.

>
> Song
>
>
> >
> >> + * trigger point. The user need to filter these entries
> >> + * accordingly.
> >> + *
> >> + * The data is stored as struct perf_branch_entry into output
> >> + * buffer *entries*. *size* is the size of *entries* in bytes.
> >> + * *flags* is reserved for now and must be zero.
> >> + *
> >> + * Return
> >> + * On success, number of bytes written to *buf*. On error, a
> >> + * negative value.
> >> + *
> >> + * **-EINVAL** if arguments invalid or **size** not a multiple
> >> + * of **sizeof**\ (**struct perf_branch_entry**\ ).
> >> + *
> >> + * **-ENOENT** if architecture does not support branch records.
> >
> > [...]
>