Re: [PATCH v4] pgo: add clang's Profile Guided Optimization infrastructure

From: Nick Desaulniers
Date: Fri Jan 15 2021 - 19:14:10 EST


> On Wed, Jan 13, 2021 at 8:07 PM Nick Desaulniers
> <ndesaulniers@xxxxxxxxxx> wrote:
> >
> > On Wed, Jan 13, 2021 at 12:55 PM Nathan Chancellor
> > <natechancellor@xxxxxxxxx> wrote:
> > >
> > > However, I see an issue with actually using the data:
> > >
> > > $ sudo -s
> > > # mount -t debugfs none /sys/kernel/debug
> > > # cp -a /sys/kernel/debug/pgo/profraw vmlinux.profraw
> > > # chown nathan:nathan vmlinux.profraw
> > > # exit
> > > $ tc-build/build/llvm/stage1/bin/llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
> > > warning: vmlinux.profraw: Invalid instrumentation profile data (bad magic)
> > > error: No profiles could be merged.
> > >
> > > Am I holding it wrong? :) Note, this is virtualized, I do not have any
> > > "real" x86 hardware that I can afford to test on right now.
> >
> > Same.
> >
> > I think the magic calculation in this patch may differ from upstream
> > llvm: https://github.com/llvm/llvm-project/blob/49142991a685bd427d7e877c29c77371dfb7634c/llvm/include/llvm/ProfileData/SampleProf.h#L96-L101
>
> Err...it looks like it was the padding calculation. With that fixed
> up, we can query the profile data to get insights on the most heavily
> called functions. Here's what my top 20 are (reset, then watch 10
> minutes worth of cat videos on youtube while running `find /` and
> sleeping at my desk). Anything curious stand out to anyone?

Hello world from my personal laptop whose kernel was rebuilt with
profiling data! Wow, I can run `find /` and watch cat videos on youtube
so fast! Users will love this! /s

Checking the sections sizes of .text.hot. and .text.unlikely. looks
good!

>
> $ llvm-profdata show -topn=20 /tmp/vmlinux.profraw
> Instrumentation level: IR entry_first = 0
> Total functions: 48970
> Maximum function count: 62070879
> Maximum internal block count: 83221158
> Top 20 functions with the largest internal block counts:
> drivers/tty/n_tty.c:n_tty_write, max count = 83221158
> rcu_read_unlock_strict, max count = 62070879
> _cond_resched, max count = 25486882
> rcu_all_qs, max count = 25451477
> drivers/cpuidle/poll_state.c:poll_idle, max count = 23618576
> _raw_spin_unlock_irqrestore, max count = 18874121
> drivers/cpuidle/governors/menu.c:menu_select, max count = 18721624
> _raw_spin_lock_irqsave, max count = 18509161
> memchr, max count = 15525452
> _raw_spin_lock, max count = 15484254
> __mod_memcg_state, max count = 14604619
> __mod_memcg_lruvec_state, max count = 14602783
> fs/ext4/hash.c:str2hashbuf_signed, max count = 14098424
> __mod_lruvec_state, max count = 12527154
> __mod_node_page_state, max count = 12525172
> native_sched_clock, max count = 8904692
> sched_clock_cpu, max count = 8895832
> sched_clock, max count = 8894627
> kernel/entry/common.c:exit_to_user_mode_prepare, max count = 8289031
> fpregs_assert_state_consistent, max count = 8287198
>
> --
> Thanks,
> ~Nick Desaulniers
>