Re: [PATCH 10/48] perf tools: Introduce bpf-output event

From: Wangnan (F)
Date: Tue Feb 23 2016 - 23:09:40 EST




On 2016/2/24 10:04, Wangnan (F) wrote:


On 2016/2/24 9:58, Wangnan (F) wrote:


On 2016/2/24 1:45, Arnaldo Carvalho de Melo wrote:
Em Mon, Feb 22, 2016 at 09:10:37AM +0000, Wang Nan escreveu:
Commit a43eec304259a6c637f4014a6d4767159b6a3aa3 (bpf: introduce
bpf_perf_event_output() helper) add a helper to enable BPF program
output data to perf ring buffer through a new type of perf event
PERF_COUNT_SW_BPF_OUTPUT. This patch enable perf to create perf
event of that type. Now perf user can use following cmdline to
receive output data from BPF programs:

# ./perf record -a -e bpf-output/no-inherit,name=evt/ \
-e ./test_bpf_output.c/map:channel.event=evt/ ls /
# ./perf script
perf 1560 [004] 347747.086295: evt: ffffffff811fd201 sys_write ...
perf 1560 [004] 347747.086300: evt: ffffffff811fd201 sys_write ...
perf 1560 [004] 347747.086315: evt: ffffffff811fd201 sys_write ...
...

Test result:
# cat ./test_bpf_output.c
/************************ BEGIN **************************/
#include <uapi/linux/bpf.h>
struct bpf_map_def {
unsigned int type;
unsigned int key_size;
unsigned int value_size;
unsigned int max_entries;
};

#define SEC(NAME) __attribute__((section(NAME), used))
static u64 (*ktime_get_ns)(void) =
(void *)BPF_FUNC_ktime_get_ns;
static int (*trace_printk)(const char *fmt, int fmt_size, ...) =
(void *)BPF_FUNC_trace_printk;
static int (*get_smp_processor_id)(void) =
(void *)BPF_FUNC_get_smp_processor_id;
static int (*perf_event_output)(void *, struct bpf_map_def *, int, void *, unsigned long) =
(void *)BPF_FUNC_perf_event_output;

struct bpf_map_def SEC("maps") channel = {
.type = BPF_MAP_TYPE_PERF_EVENT_ARRAY,
.key_size = sizeof(int),
.value_size = sizeof(u32),
.max_entries = __NR_CPUS__,
};

SEC("func_write=sys_write")
int func_write(void *ctx)
{
struct {
u64 ktime;
int cpuid;
} __attribute__((packed)) output_data;
char error_data[] = "Error: failed to output: %d\n";

output_data.cpuid = get_smp_processor_id();
output_data.ktime = ktime_get_ns(); supr
int err = perf_event_output(ctx, &channel, get_smp_processor_id(),
&output_data, sizeof(output_data));
if (err)
trace_printk(error_data, sizeof(error_data), err);
return 0;
}
char _license[] SEC("license") = "GPL";
int _version SEC("version") = LINUX_VERSION_CODE;
/************************ END ***************************/

# ./perf record -a -e bpf-output/no-inherit,name=evt/ \
-e ./test_bpf_output.c/map:channel.event=evt/ ls /
# ./perf script | grep ls
ls 2242 [003] 347851.557563: evt: ffffffff811fd201 sys_write ...
ls 2242 [003] 347851.557571: evt: ffffffff811fd201 sys_write ...
So, there is something strange here:

if (unlikely(event->oncpu != smp_processor_id()))
return -EOPNOTSUPP;



All failures have 'event->oncpu == -1' here. I guess we should suppress warning in
this case. But why event->oncpu becomes -1?


For this specific test it is not surprising to see these error messages. In this test
we create bpf-output channel on 'ls' process only, but the BPF script is triggered
on all procs (BPF triggering is not related to perf event scheduling). Trying to
output data through 'ls' specific bpf-output channel should fail if this 'sys_write'
is not issued by 'ls' or its children. So it is a correct behavior.

However, I also see them in system wide channel:

# echo "" > /sys/kernel/debug/tracing/trace
# ./perf record -a -e bpf-output/no-inherit,name=evt/ \
-e ./test_bpf_output.c/map:channel.event=evt/
-a
^C[ perf record: Woken up 0 times to write data ]
[ perf record: Captured and wrote 17.534 MB perf.data (264326 samples) ]
# cat /sys/kernel/debug/tracing/trace | tail
rs:main Q:Reg-582 [000] d..2 4858.711225: : Error: failed to output: -95
rs:main Q:Reg-582 [000] d..2 4858.711241: : Error: failed to output: -95
gmain-1858 [003] d..2 4858.711436: : Error: failed to output: -95
gmain-1858 [003] d..2 4858.711441: : Error: failed to output: -95
gmain-1858 [003] d..2 4858.711473: : Error: failed to output: -95
rs:main Q:Reg-582 [002] d..2 4858.712215: : Error: failed to output: -95
rs:main Q:Reg-582 [002] d..2 4858.712224: : Error: failed to output: -95
gmain-1858 [003] d..2 4858.712230: : Error: failed to output: -95
rs:main Q:Reg-582 [002] d..2 4858.712235: : Error: failed to output: -95
rs:main Q:Reg-582 [002] d..2 4858.712239: : Error: failed to output: -95

System wide events can also be scheduled in and out. If the bpf-output events
are scheduled out, trying to output data through it causes the above failure.
I don't think it is a problem.

Peter, Could you please give some infomation? In which case a system wide
bpf output channel would be scheduled out?

Thank you.