Re: [PATCH/RFC v3] perf core: Allow setting up max frame stack depth via sysctl

From: Brendan Gregg
Date: Tue Apr 26 2016 - 16:03:11 EST


On Mon, Apr 25, 2016 at 5:49 PM, Brendan Gregg
<brendan.d.gregg@xxxxxxxxx> wrote:
> On Mon, Apr 25, 2016 at 5:47 PM, Arnaldo Carvalho de Melo
> <arnaldo.melo@xxxxxxxxx> wrote:
>> Em Mon, Apr 25, 2016 at 05:44:00PM -0700, Alexei Starovoitov escreveu:
>>> On Mon, Apr 25, 2016 at 09:29:28PM -0300, Arnaldo Carvalho de Melo wrote:
>>> > Em Mon, Apr 25, 2016 at 05:07:26PM -0700, Alexei Starovoitov escreveu:
>>> > > > + {
>>> > > > + .procname = "perf_event_max_stack",
>>> > > > + .data = NULL, /* filled in by handler */
>>> > > > + .maxlen = sizeof(sysctl_perf_event_max_stack),
>>> > > > + .mode = 0644,
>>> > > > + .proc_handler = perf_event_max_stack_handler,
>>> > > > + .extra1 = &zero,
>>> > > > + },
>>> >
>>> > > you need to define a max value otherwise perf_callchain_entry__sizeof
>>> > > will overflow. Sure it's root only facility, but still not nice.
>>> > > 1M? Anything above 1M stack frames would be insane anyway.
>>> > > The rest looks good. Thanks!
>>> >
>>> > Something else? ;-)
>>>
>>> all looks good to me. Thanks a bunch!
>>
>> Thanks for checking!
>>
>>> > Because we only allocate the callchain percpu data structures when there
>>> > is a user, which allows for changing the max easily, its just a matter
>>> > of having no callchain users at that point.
>>> >
>>> > Reported-and-Tested-by: Brendan Gregg <brendan.d.gregg@xxxxxxxxx>
>>> > Acked-by: Alexei Starovoitov <ast@xxxxxxxxxx>
>>>
>>> yep :)
>>> hopefully Brendan can give it another spin.
>>
>> Agreed, and I'm calling it a day anyway, Brendan, please consider
>> retesting, thanks,
>>
>
> Will do, thanks!
>

Looks good.

I started with max depth = 512, and even that was still truncated, and
had to profile again at 1024 to capture the full stacks. Seems to
generally match the flame graph I generated with V1, which made me
want to check that I'm running the new patch, and am:

# grep six_hundred_forty_kb /proc/kallsyms
ffffffff81c431e0 d six_hundred_forty_kb

I was mucking around and was able to get "corrupted callchain.
skipping..." errors, but these look to be expected -- that was
profiling a binary (bash) that doesn't have frame pointers. Some perf
script -D output:

16 3204735442777 0x18f0d8 [0x2030]: PERF_RECORD_SAMPLE(IP, 0x1):
18134/18134: 0xffffffff8118b6a4 period: 1001001 addr: 0
... FP chain: nr:1023
..... 0: ffffffffffffff80
..... 1: ffffffff8118b6a4
..... 2: ffffffff8118bc47
..... 3: ffffffff811d8c85
..... 4: ffffffff811b18f8
..... 5: ffffffff811b2a55
..... 6: ffffffff811b5ea0
..... 7: ffffffff810663c0
..... 8: ffffffff810666e0
..... 9: ffffffff817b9d28
..... 10: fffffffffffffe00
..... 11: 00000000004b45e2
..... 12: 000000000000610f
..... 13: 0000000000006110
..... 14: 0000000000006111
..... 15: 0000000000006112
..... 16: 0000000000006113
..... 17: 0000000000006114
..... 18: 0000000000006115
..... 19: 0000000000006116
..... 20: 0000000000006117
[...]
..... 1021: 000000000000650b
..... 1022: 000000000000650c
... thread: bash:18134
...... dso: /lib/modules/4.6.0-rc5-virtual/build/vmlinux
bash 18134 [016] 3204.735442: 1001001 cpu-clock:

Brendan