Re: WARNING in event_function_local

From: Steven Rostedt
Date: Tue Feb 12 2019 - 21:14:55 EST


On Tue, 12 Feb 2019 17:27:56 -0800
Kees Cook <keescook@xxxxxxxxxxxx> wrote:

> On Wed, May 9, 2018 at 10:18 PM Dmitry Vyukov <dvyukov@xxxxxxxxxx> wrote:
> >
> > On Sat, Jan 27, 2018 at 8:58 PM, syzbot
> > <syzbot+352bd10e338d9a90e5e0@xxxxxxxxxxxxxxxxxxxxxxxxx> wrote:
> > > Hello,
> > >
> > > syzbot hit the following crash on bpf-next commit
> > > 8223967fe0b8eb2448cca5cfe3c64a0838e6f60d (Sat Jan 27 01:06:23 2018 +0000)
> > > Merge branch 'fix-lpm-map'
> > >
> > > So far this crash happened 2 times on bpf-next.
> > > C reproducer is attached.
> > > syzkaller reproducer is attached.
> > > Raw console output is attached.
> > > compiler: gcc (GCC) 7.1.1 20170620
> > > .config is attached.
> > >
> > > IMPORTANT: if you fix the bug, please add the following tag to the commit:
> > > Reported-by: syzbot+352bd10e338d9a90e5e0@xxxxxxxxxxxxxxxxxxxxxxxxx
> > > It will help syzbot understand when the bug is fixed. See footer for
> > > details.
> > > If you forward the report, please keep this part and the footer.
> >
> >
> > This was bisected to:
> >
> > commit 4ee7c60de83ac01fa4c33c55937357601631e8ad
> > Author: Steven Rostedt (VMware) <rostedt@xxxxxxxxxxx>
> > Date: Fri Mar 23 10:18:03 2018 -0400
> >
> > init, tracing: Add initcall trace events
> >
> >
> >
> > > WARNING: CPU: 1 PID: 5105 at kernel/events/core.c:331
> > > event_function_local.constprop.102+0x494/0x560 kernel/events/core.c:331
> > > Kernel panic - not syncing: panic_on_warn set ...
> > >
> > > CPU: 1 PID: 5105 Comm: syzkaller318382 Not tainted 4.15.0-rc8+ #8
> > > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> > > Google 01/01/2011
> > > Call Trace:
> > > <IRQ>
> > > __dump_stack lib/dump_stack.c:17 [inline]
> > > dump_stack+0x194/0x257 lib/dump_stack.c:53
> > > panic+0x1e4/0x41c kernel/panic.c:183
> > > __warn+0x1dc/0x200 kernel/panic.c:547
> > > report_bug+0x211/0x2d0 lib/bug.c:184
> > > fixup_bug.part.11+0x37/0x80 arch/x86/kernel/traps.c:178
> > > fixup_bug arch/x86/kernel/traps.c:247 [inline]
> > > do_error_trap+0x2d7/0x3e0 arch/x86/kernel/traps.c:296
> > > do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:315
> > > invalid_op+0x22/0x40 arch/x86/entry/entry_64.S:1096
> > > RIP: 0010:event_function_local.constprop.102+0x494/0x560
> > > kernel/events/core.c:331
> > > RSP: 0018:ffff8801db307d00 EFLAGS: 00010006
> > > RAX: ffff8801d902a180 RBX: ffff8801d60beac0 RCX: ffffffff81862a54
> > > RDX: 0000000000010000 RSI: ffff8801db3079f0 RDI: ffff8801d60bebc8
> > > RBP: ffff8801db307da0 R08: 1ffff1003b660edf R09: 000000000000000c
> > > R10: ffff8801db307bb8 R11: 0000000000000044 R12: ffff8801db330e00
> > > R13: 1ffff1003b660fa3 R14: ffff8801d90841c0 R15: ffff8801d979ce40
> > > perf_event_disable_local kernel/events/core.c:1979 [inline]
> > > perf_pending_event+0x1e5/0x240 kernel/events/core.c:5446
> > > irq_work_run_list+0x184/0x240 kernel/irq_work.c:155
> > > irq_work_run+0x1d/0x50 kernel/irq_work.c:170
> > > smp_irq_work_interrupt+0x103/0x640 arch/x86/kernel/irq_work.c:21
> > > irq_work_interrupt+0xa9/0xb0 arch/x86/entry/entry_64.S:968
> > > </IRQ>
>
> I got curious why this report went silent, and bisected the "fix" to:

It's still tagged as "TODO" in my INBOX, along with 1000 other emails
tagged with TODO :-p

>
> 214ff83d4473 ("KVM: x86: hyperv: implement PV IPI send hypercalls")
>
> This contains new tracepoints, and reverting them re-exposes the bug:
>
> diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
> index 89d20ed1d2e8..462fab402bfa 100644
> --- a/arch/x86/kvm/hyperv.c
> +++ b/arch/x86/kvm/hyperv.c
> @@ -1468,15 +1468,21 @@ static u64 kvm_hv_send_ipi(struct kvm_vcpu
> *current_vcpu, u64 ingpa, u64 outgpa,
> all_cpus = false;
> valid_bank_mask = BIT_ULL(0);
>
> + panic("how am I using hyperV?!");
> + /*
> trace_kvm_hv_send_ipi(vector, sparse_banks[0]);
> + */
> } else {
> if (unlikely(kvm_read_guest(kvm, ingpa, &send_ipi_ex,
> sizeof(send_ipi_ex))))
> return HV_STATUS_INVALID_HYPERCALL_INPUT;
>
> + panic("how am I using hyperV?! (ex)");
> + /*
> trace_kvm_hv_send_ipi_ex(send_ipi_ex.vector,
> send_ipi_ex.vp_set.format,
> send_ipi_ex.vp_set.valid_bank_mask);
> + */
>
> vector = send_ipi_ex.vector;
> valid_bank_mask = send_ipi_ex.vp_set.valid_bank_mask;
> diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h
> index 6432d08c7de7..895bd184feea 100644
> --- a/arch/x86/kvm/trace.h
> +++ b/arch/x86/kvm/trace.h
> @@ -1424,6 +1424,7 @@ TRACE_EVENT(kvm_hv_flush_tlb_ex,
> /*
> * Tracepoints for kvm_hv_send_ipi.
> */
> +#if 0
> TRACE_EVENT(kvm_hv_send_ipi,
> TP_PROTO(u32 vector, u64 processor_mask),
> TP_ARGS(vector, processor_mask),
> @@ -1462,6 +1463,7 @@ TRACE_EVENT(kvm_hv_send_ipi_ex,
> __entry->vector, __entry->format,
> __entry->valid_bank_mask)
> );
> +#endif
> #endif /* _TRACE_KVM_H */
>
> #undef TRACE_INCLUDE_PATH
>
> Steve, what could possibly be happening here? Just adding more
> tracepoints causes some kind of race where the task vs current test
> trips in event_function_local():
>
> if (WARN_ON_ONCE(task != current))
> goto unlock;

That's perf code. How are you getting there? What's special about this
run? You have perf running?

-- Steve

>
> Is this maybe just an unlucky condition with the event loop running in
> an IRQ? Should the WARN be expected, or is running under an IRQ
> unexpected?
>
> It'd be nice to find the source of this, since it seems to have gotten
> papered over instead of actually fixed. (Especially since the original
> "introduction" bisect shows similar "just adding new trace events.)
>