Re: BPF vs objtool again

From: Alexei Starovoitov
Date: Wed Apr 29 2020 - 22:10:58 EST


On Wed, Apr 29, 2020 at 07:13:00PM -0500, Josh Poimboeuf wrote:
> On Wed, Apr 29, 2020 at 04:41:59PM -0700, Alexei Starovoitov wrote:
> > On Wed, Apr 29, 2020 at 04:51:59PM -0500, Josh Poimboeuf wrote:
> > > On Thu, Jul 18, 2019 at 12:14:08PM -0700, tip-bot for Josh Poimboeuf wrote:
> > > > Commit-ID: 3193c0836f203a91bef96d88c64cccf0be090d9c
> > > > Gitweb: https://git.kernel.org/tip/3193c0836f203a91bef96d88c64cccf0be090d9c
> > > > Author: Josh Poimboeuf <jpoimboe@xxxxxxxxxx>
> > > > AuthorDate: Wed, 17 Jul 2019 20:36:45 -0500
> > > > Committer: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
> > > > CommitDate: Thu, 18 Jul 2019 21:01:06 +0200
> > > >
> > > > bpf: Disable GCC -fgcse optimization for ___bpf_prog_run()
> > >
> > > For some reason, this
> > >
> > > __attribute__((optimize("-fno-gcse")))
> > >
> > > is disabling frame pointers in ___bpf_prog_run(). If you compile with
> > > CONFIG_FRAME_POINTER it'll show something like:
> > >
> > > kernel/bpf/core.o: warning: objtool: ___bpf_prog_run.cold()+0x7: call without frame pointer save/setup
> >
> > you mean it started to disable frame pointers from some version of gcc?
> > It wasn't doing this before, since objtool wasn't complaining, right?
> > Sounds like gcc bug?
>
> I actually think this warning has been around for a while. I just only
> recently looked at it. I don't think anything changed in GCC, it's just
> that almost nobody uses CONFIG_FRAME_POINTER, so it wasn't really
> noticed.
>
> > > Also, since GCC 9.1, the GCC docs say "The optimize attribute should be
> > > used for debugging purposes only. It is not suitable in production
> > > code." That doesn't sound too promising.
> > >
> > > So it seems like this commit should be reverted. But then we're back to
> > > objtool being broken again in the RETPOLINE=n case, which means no ORC
> > > coverage in this function. (See above commit for the details)
> > >
> > > Some ideas:
> > >
> > > - Skip objtool checking of that func/file (at least for RETPOLINE=n) --
> > > but then it won't have ORC coverage.
> > >
> > > - Get rid of the "double goto" in ___bpf_prog_run(), which simplifies it
> > > enough for objtool to understand -- but then the text explodes for
> > > RETPOLINE=y.
> >
> > How that will look like?
> > That could be the best option.
>
> For example:
>
> #define GOTO ({ goto *jumptable[insn->code]; })
>
> and then replace all 'goto select_insn' with 'GOTO;'
>
> The problem is that with RETPOLINE=y, the function text size grows from
> 5k to 7k, because for each of the 160+ retpoline JMPs, GCC (stupidly)
> reloads the jump table register into a scratch register.

that would be a tiny change, right?
I'd rather go with that and gate it with 'ifdef CONFIG_FRAME_POINTER'
Like:
#ifndef CONFIG_FRAME_POINTER
#define CONT ({ insn++; goto select_insn; })
#define CONT_JMP ({ insn++; goto select_insn; })
#else
#define CONT ({ insn++; goto *jumptable[insn->code]; })
#define CONT_JMP ({ insn++; goto *jumptable[insn->code]; })
#endif

The reason this CONT and CONT_JMP macros are there because a combination
of different gcc versions together with different cpus make branch predictor
behave 'unpredictably'.

I've played with CONT and CONT_JMP either both doing direct goto or
indirect goto and observed quite different performance characteristics
from the interpreter.
What you see right now was the best tune I could get from a set of cpus
I had to play with and compilers. If I did the same tuning today the outcome
could have been different.
So I think it's totally fine to use above code. I think some cpus may actually
see performance gains with certain versions of gcc.
The retpoline text increase is unfortunate but when retpoline is on
other security knobs should be on too. In particular CONFIG_BPF_JIT_ALWAYS_ON
should be on as well. Which will remove interpreter from .text completely.