Re: Getting empty callchain from perf_callchain_kernel()

From: Kairui Song
Date: Thu May 23 2019 - 22:24:01 EST


On Fri, May 24, 2019 at 1:27 AM Josh Poimboeuf <jpoimboe@xxxxxxxxxx> wrote:
>
> On Fri, May 24, 2019 at 12:41:59AM +0800, Kairui Song wrote:
> > On Thu, May 23, 2019 at 11:24 PM Josh Poimboeuf <jpoimboe@xxxxxxxxxx> wrote:
> > >
> > > On Thu, May 23, 2019 at 10:50:24PM +0800, Kairui Song wrote:
> > > > > > Hi Josh, this still won't fix the problem.
> > > > > >
> > > > > > Problem is not (or not only) with ___bpf_prog_run, what actually went
> > > > > > wrong is with the JITed bpf code.
> > > > >
> > > > > There seem to be a bunch of issues. My patch at least fixes the failing
> > > > > selftest reported by Alexei for ORC.
> > > > >
> > > > > How can I recreate your issue?
> > > >
> > > > Hmm, I used bcc's example to attach bpf to trace point, and with that
> > > > fix stack trace is still invalid.
> > > >
> > > > CMD I used with bcc:
> > > > python3 ./tools/stackcount.py t:sched:sched_fork
> > >
> > > I've had problems in the past getting bcc to build, so I was hoping it
> > > was reproducible with a standalone selftest.
> > >
> > > > And I just had another try applying your patch, self test is also failing.
> > >
> > > Is it the same selftest reported by Alexei?
> > >
> > > test_stacktrace_map:FAIL:compare_map_keys stackid_hmap vs. stackmap err -1 errno 2
> > >
> > > > I'm applying on my local master branch, a few days older than
> > > > upstream, I can update and try again, am I missing anything?
> > >
> > > The above patch had some issues, so with some configs you might see an
> > > objtool warning for ___bpf_prog_run(), in which case the patch doesn't
> > > fix the test_stacktrace_map selftest.
> > >
> > > Here's the latest version which should fix it in all cases (based on
> > > tip/master):
> > >
> > > https://git.kernel.org/pub/scm/linux/kernel/git/jpoimboe/linux.git/commit/?h=bpf-orc-fix
> >
> > Hmm, I still get the failure:
> > test_stacktrace_map:FAIL:compare_map_keys stackid_hmap vs. stackmap
> > err -1 errno 2
> >
> > And I didn't see how this will fix the issue. As long as ORC need to
> > unwind through the JITed code it will fail. And that will happen
> > before reaching ___bpf_prog_run.
>
> Ok, I was able to recreate by doing
>
> echo 1 > /proc/sys/net/core/bpf_jit_enable
>
> first. I'm guessing you have CONFIG_BPF_JIT_ALWAYS_ON.
>

Yes, with JIT off it will be fixed. I can confirm that.

--
Best Regards,
Kairui Song