Re: [PATCH -tip v3 1/2] kcov: Make runtime functions noinstr-compatible

From: Marco Elver
Date: Mon Jun 15 2020 - 03:53:23 EST


On Sat, 13 Jun 2020 at 19:24, Dmitry Vyukov <dvyukov@xxxxxxxxxx> wrote:
>
> On Fri, Jun 12, 2020 at 1:49 PM Marco Elver <elver@xxxxxxxxxx> wrote:
> > On Fri, 12 Jun 2020, Dmitry Vyukov wrote:
> >
> > > On Thu, Jun 11, 2020 at 11:55 PM Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> > > >
> > > > On Mon, Jun 08, 2020 at 01:01:08PM +0200, Peter Zijlstra wrote:
> > > > > On Mon, Jun 08, 2020 at 09:57:39AM +0200, Dmitry Vyukov wrote:
> > > > >
> > > > > > As a crazy idea: is it possible to employ objtool (linker script?) to
> > > > > > rewrite all coverage calls to nops in the noinstr section? Or relocate
> > > > > > to nop function?
> > > > > > What we are trying to do is very static, it _should_ have been done
> > > > > > during build. We don't have means in existing _compilers_ to do this,
> > > > > > but maybe we could do it elsewhere during build?...
> > > > >
> > > > > Let me try and figure out how to make objtool actually rewrite code.
> > > >
> > > > The below is quite horrific but seems to sorta work.
> > > >
[...]
> > > >
> > > > I'll have to dig around a little more to see if I can't get rid of the
> > > > relocation entirely. Also, I need to steal better arch_nop_insn() from
> > > > the kernel :-)
> > >
> > > Wow! Cool!
> > > Thanks for resolving this. I guess this can be used to wipe more
> > > unwanted things in future :)
> > >
> > > Marco double checked and his patch did not actually fix the existing
> > > crash under KCSAN. The call itself was the problem or something,
> > > returning early did not really help. This should hopefully fix it.
> > > Marco, please double check.
> > >
> > > Re better nop insn, I don't know how much work it is (or how much you
> > > are striving for perfection :)). But from KCOV point of view, I think
> > > we can live with more or less any nop insn. The main thing was
> > > removing overhead from all other (not noinstr) cases, I would assume
> > > the noinstr cases where we use nops are very rare. I mean don't spend
> > > too much time on it, if it's not needed for something else.
> > >
> > > Thanks again!
> >
> > This is great, thanks! To make noinstr not call into KCOV, this
> > definitely seems to do the job.
> >
> > Though sadly it doesn't fix the problem I'm seeing. The problem occurs
> > when I compile using Clang, and enable either KASAN or KCSAN together
> > with KCOV. Actually, turning off KCOV also shows this... a stacktrace is
> > below.
>
> I can't reproduce this after tuning off KCOV. Just KASAN works for me.
> Also the following helps (at least for my config):

Disabling KCOV for smp_processor_id now moves the crash elsewhere. In
the case of KASAN into its 'memcpy' wrapper, called after
__this_cpu_read in fixup_bad_iret. This is making me suspicious,
because it shouldn't be called from the noinstr functions.

For KCSAN the crash still happens in check_preemption_disabled, in the
inlined native_save_fl function (apparently on its 'pushf'). If I turn
fixup_bad_iret's __this_cpu_read into a raw_cpu_read (to bypass
check_preemption_disabled), no more crash with KCSAN.

> diff --git a/lib/Makefile b/lib/Makefile
> index b1c42c10073b9..8514519bc5bcb 100644
> --- a/lib/Makefile
> +++ b/lib/Makefile
> @@ -17,6 +17,7 @@ KCOV_INSTRUMENT_list_debug.o := n
> KCOV_INSTRUMENT_debugobjects.o := n
> KCOV_INSTRUMENT_dynamic_debug.o := n
> KCOV_INSTRUMENT_fault-inject.o := n
> +KCOV_INSTRUMENT_smp_processor_id.o := n
>
>
> Btw, do you use inline instrumentation for KASAN or outline?
> I use inline KASAN, so maybe it's a function call that's the problem.
> KCOV uses calls and KCSAN also uses calls.

Using inline KASAN. The calls may be only indirectly contributing,
because at the point the BUG happens we're not in any runtime in the
case of smp_processor_id crash (noinstr is supposed to add __no_kcsan,
__no_sanitize_address etc). My guess here is that the compiler ends up
generating either a call or increased stack usage which causes a crash
in a random place.

> And it's not that we are getting that "BUG:", right? Otherwise we
> would see it in non-KCOV builds as well. So it must be something in
> the very beginning of the function...

For reference, I've attached my config (rebased on 5.8-rc1, with the
SAN-vs-noinstr series on top:
https://lore.kernel.org/lkml/20200604102241.466509982@xxxxxxxxxxxxx/).

Right now I'm not liking that fixup_bad_iret calls into KASAN's
memcpy. But that doesn't explain the crash with KCSAN. Maybe we're
running out of stack space?

Thanks,
-- Marco

Attachment: config
Description: Binary data