Re: [PATCH v5 bpf-next 2/3] bpf: introduce helper bpf_get_branch_snapshot

From: Song Liu
Date: Thu Sep 02 2021 - 19:03:53 EST




> On Sep 2, 2021, at 3:53 PM, Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> wrote:
>
> On Thu, Sep 2, 2021 at 9:58 AM Song Liu <songliubraving@xxxxxx> wrote:
>>
>> Introduce bpf_get_branch_snapshot(), which allows tracing pogram to get
>> branch trace from hardware (e.g. Intel LBR). To use the feature, the
>> user need to create perf_event with proper branch_record filtering
>> on each cpu, and then calls bpf_get_branch_snapshot in the bpf function.
>> On Intel CPUs, VLBR event (raw event 0x1b00) can be use for this.
>>
>> Signed-off-by: Song Liu <songliubraving@xxxxxx>
>> ---
>> include/uapi/linux/bpf.h | 22 ++++++++++++++++++++++
>> kernel/bpf/trampoline.c | 3 ++-
>> kernel/trace/bpf_trace.c | 33 +++++++++++++++++++++++++++++++++
>> tools/include/uapi/linux/bpf.h | 22 ++++++++++++++++++++++
>> 4 files changed, 79 insertions(+), 1 deletion(-)
>>
>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
>> index 791f31dd0abee..c986e6fad5bc0 100644
>> --- a/include/uapi/linux/bpf.h
>> +++ b/include/uapi/linux/bpf.h
>> @@ -4877,6 +4877,27 @@ union bpf_attr {
>> * Get the struct pt_regs associated with **task**.
>> * Return
>> * A pointer to struct pt_regs.
>> + *
>> + * long bpf_get_branch_snapshot(void *entries, u32 size, u64 flags)
>> + * Description
>> + * Get branch trace from hardware engines like Intel LBR. The
>> + * branch trace is taken soon after the trigger point of the
>> + * BPF program, so it may contain some entries after the
>
> This part is a leftover from previous design, so not relevant anymore?

Hmm.. This is still relevant, but not very accurate. I guess we should
provide more information, like "For more information about branches before
the trigger point, this should be called early in the BPF program".

Song


>
>> + * trigger point. The user need to filter these entries
>> + * accordingly.
>> + *
>> + * The data is stored as struct perf_branch_entry into output
>> + * buffer *entries*. *size* is the size of *entries* in bytes.
>> + * *flags* is reserved for now and must be zero.
>> + *
>> + * Return
>> + * On success, number of bytes written to *buf*. On error, a
>> + * negative value.
>> + *
>> + * **-EINVAL** if arguments invalid or **size** not a multiple
>> + * of **sizeof**\ (**struct perf_branch_entry**\ ).
>> + *
>> + * **-ENOENT** if architecture does not support branch records.
>
> [...]