Re: [PATCH v9] pgo: add clang's Profile Guided Optimization infrastructure

From: Peter Zijlstra
Date: Sat Jun 12 2021 - 14:39:18 EST


On Sat, Jun 12, 2021 at 10:25:57AM -0700, Bill Wendling wrote:
> On Sat, Jun 12, 2021 at 9:59 AM Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> >
> > On Wed, Apr 07, 2021 at 02:17:04PM -0700, Bill Wendling wrote:
> > > From: Sami Tolvanen <samitolvanen@xxxxxxxxxx>
> > >
> > > Enable the use of clang's Profile-Guided Optimization[1]. To generate a
> > > profile, the kernel is instrumented with PGO counters, a representative
> > > workload is run, and the raw profile data is collected from
> > > /sys/kernel/debug/pgo/profraw.
> > >
> > > The raw profile data must be processed by clang's "llvm-profdata" tool
> > > before it can be used during recompilation:
> > >
> > > $ cp /sys/kernel/debug/pgo/profraw vmlinux.profraw
> > > $ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
> > >
> > > Multiple raw profiles may be merged during this step.
> > >
> > > The data can now be used by the compiler:
> > >
> > > $ make LLVM=1 KCFLAGS=-fprofile-use=vmlinux.profdata ...
> > >
> > > This initial submission is restricted to x86, as that's the platform we
> > > know works. This restriction can be lifted once other platforms have
> > > been verified to work with PGO.
> >
> > *sigh*, and not a single x86 person on Cc, how nice :-/
> >
> This tool is generic and, despite the fact that it's first enabled for
> x86, it contains no x86-specific code. The reason we're restricting it
> to x86 is because it's the platform we tested on.

You're modifying a lot of x86 files, you don't think it's good to let us
know? Worse, afaict this -fprofile-generate changes code generation,
and we definitely want to know about that.

> > > arch/x86/Kconfig | 1 +
> > > arch/x86/boot/Makefile | 1 +
> > > arch/x86/boot/compressed/Makefile | 1 +
> > > arch/x86/crypto/Makefile | 4 +
> > > arch/x86/entry/vdso/Makefile | 1 +
> > > arch/x86/kernel/vmlinux.lds.S | 2 +
> > > arch/x86/platform/efi/Makefile | 1 +
> > > arch/x86/purgatory/Makefile | 1 +
> > > arch/x86/realmode/rm/Makefile | 1 +
> > > arch/x86/um/vdso/Makefile | 1 +


> > > +CFLAGS_PGO_CLANG := -fprofile-generate
> > > +export CFLAGS_PGO_CLANG

> > And which of the many flags in noinstr disables this?
> >
> These flags aren't used with PGO. So there's no need to disable them.

Supposedly -fprofile-generate adds instrumentation to the generated
code. noinstr *MUST* disable that. If not, this is a complete
non-starter for x86.

> > Also, and I don't see this answered *anywhere*, why are you not using
> > perf for this? Your link even mentions Sampling Profilers (and I happen
> > to know there's been significant effort to make perf output work as
> > input for the PGO passes of the various compilers).
> >
> Instruction-based (non-sampling) profiling gives us a better
> context-sensitive profile, making PGO more impactful. It's also useful
> for coverage whereas sampling profiles cannot.

We've got KCOV and GCOV support already. Coverage is also not an
argument mentioned anywhere else. Coverage can go pound sand, we really
don't need a third means of getting that.

Do you have actual numbers that back up the sampling vs instrumented
argument? Having the instrumentation will affect performance which can
scew the profile just the same.

Also, sampling tends to capture the hot spots very well.