Re: [RFC PATCH 00/22] perf tools: introduce 'perf bpf' command to load eBPF programs.

From: Alexei Starovoitov
Date: Tue May 05 2015 - 01:49:10 EST


On 5/4/15 9:41 PM, Wang Nan wrote:

That's great. Could you please append the description of 'llvm -s' into your README
or comments? It has cost me a lot of time for dumping eBPF instructions so I decide to
add it into perf...

sure. it's just -filetype=asm flag to llc instead of -filetype=obj.
Eventually it will work as normal 'clang -S file.c' when few more
llvm commits are accepted upstream.

My collage He Kuang is working on variable accessing. Probing inside function body
and accessing its local variable will be supported like this:

SEC("config") char _prog_config[] = "prog: func_name:1234 vara=localvara"
int prog(struct pt_regs *ctx, unsigned long vara) {
// vara is the value of localvara of function func_name
}

that would be great. I'm not sure though how you can achieve that
without changing C front-end ?

It's not very difficult. He is trying to generate the loader of vara
as prologue, then paste the prologue and the main eBPF program together.
From the viewpoint of kernel bpf verifier, there is only one param (ctx); the
prologue program fetches the value of vara then put it into a propoer register,
then main program work.

got it. I think that's much cleaner than what I was proposing.
The only question is then:
char _prog_config[] = "prog: func_name:1234 vara=localvara"
should actually be something like "... r2=localvara", right?
since prologue would need to assign into r2.
Otherwise I don't see where you find out about 'vara' inside
compiled bpf code.

Would be nice if this can be done without debug info.
Like in tracex2_kern.c I have:
SEC("kprobe/sys_write")
int bpf_prog(struct pt_regs *ctx)
{
long wr_size = ctx->dx; /* arg3 */

with your prolog generator the above can be rewritten as:
SEC("kprobe/sys_write")
int bpf_prog(struct pt_regs *unused, int fd, char *buf, size_t wr_size)
{
/* use wr_size */

that will improve ease of use a lot.

Another possible solution is to change the protocol between kprobe and eBPF
program, makes kprobes calls fetchers and passes them to eBPF program as
a second param (group all varx together).
A prologue may still need in this case to load each param into correct
register.

you mean grouping varx together in some other struct and embedding it
together with pt_regs into new container struct?
doable, but your first approach is quite clean already. why bother.

Could you please consider the following problem?

We find there are serval __lock_page() calls last very long time. We are going
to find corresponding __unlock_page() so we can know what blocks them. We want to
insert eBPF programs before io_schedule() in __lock_page(), and also add eBPF program
on the entry of __unlock_page(), so we can compute the interval between page locking and
unlocking. If time is longer than a threshold, let __unlock_page() trigger a perf sampling
so we get its call stack. In this case, eBPF program acts as a trace filter.

all makes sense and your use case fits quite well into existing
bpf+kprobe model. I'm not sure why you're calling a 'problem'.
A problem of how to display that call stack from perf?
I would say it fits better as a sample than a trace.
If you dump it as a trace, it won't easy to decipher, whereas if you
treat it a sampling event, perf record/report facility will pick it up and display nicely. Meaning that one sample == lock_page/unlock_page
latency > N. Then existing sample_callchain flag should work.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/