Re: suspicious RCU usage in msr tracing.

From: Paul E. McKenney
Date: Fri Jan 22 2016 - 18:51:57 EST


On Fri, Jan 22, 2016 at 02:33:29PM -0800, Andi Kleen wrote:
> On Thu, Jan 21, 2016 at 07:34:27PM -0500, Dave Jones wrote:
> > Just hit this on Linus' current tree.
> >
> > [ INFO: suspicious RCU usage. ]
> > 4.4.0-think+ #1 Tainted: G W
> > -------------------------------
> > ./arch/x86/include/asm/msr-trace.h:47 suspicious rcu_dereference_check() usage!
> >
> > other info that might help us debug this:
> >
> >
> > RCU used illegally from idle CPU!
> > rcu_scheduler_active = 1, debug_locks = 0
> > RCU used illegally from extended quiescent state!
> > no locks held by swapper/3/0.
> >
> > stack backtrace:
> > CPU: 3 PID: 0 Comm: swapper/3 Tainted: G W 4.4.0-think+ #1
> > ffffffff8ef82ac0 c4dd1c3486ada576 ffff880468e07f08 ffffffff8e566ae1
> > ffff880464905340 ffff880468e07f38 ffffffff8e135bf8 ffffffff8f665b00
> > ffff880464918000 0000000000000000 ffff880464920000 ffff880468e07f70
> > Call Trace:
> > <IRQ> [<ffffffff8e566ae1>] dump_stack+0x4e/0x7d
> > [<ffffffff8e135bf8>] lockdep_rcu_suspicious+0xf8/0x110
> > [<ffffffff8e5b3f36>] do_trace_write_msr+0x136/0x140
> > [<ffffffff8e061753>] native_apic_msr_eoi_write+0x23/0x30
> > [<ffffffff8e054456>] smp_call_function_single_interrupt+0x36/0x50
> > [<ffffffff8e05447e>] smp_call_function_interrupt+0xe/0x10
> > [<ffffffff8ed1bdc0>] call_function_interrupt+0x90/0xa0
> > <EOI> [<ffffffff8e2f1b20>] ? __asan_store4+0x80/0x80
> > [<ffffffff8eac6477>] ? poll_idle+0x67/0xc0
> > [<ffffffff8eac5a94>] cpuidle_enter_state+0x174/0x430
> > [<ffffffff8eac5da7>] cpuidle_enter+0x17/0x20
> > [<ffffffff8e128ff5>] cpu_startup_entry+0x4c5/0x5a0
> > [<ffffffff8e128b30>] ? default_idle_call+0x60/0x60
> > [<ffffffff8e18d354>] ? clockevents_config_and_register+0x64/0x70
> > [<ffffffff8e055489>] start_secondary+0x269/0x300
> > [<ffffffff8e055220>] ? set_cpu_sibling_map+0x970/0x970
> > ------------[ cut here ]------------
> >
> > 44 DEFINE_EVENT(msr_trace_class, write_msr,
> > 45 TP_PROTO(unsigned msr, u64 val, int failed),
> > 46 TP_ARGS(msr, val, failed)
> > 47 );
> >
> > Andi, could this be caused by 7f47d8cc039f8746e0038fe05f1ddcb15a2e27f0 ?
>
> Yes, it's likely. The trace points use RCU, and with X2APIC on there is a
> MSR write with trace point at the end up of interrupts to ack the APIC.
>
> This is what the IPI uses:
>
> static inline void entering_ack_irq(void)
> {
> ack_APIC_irq();
> entering_irq();
> }
>
> I assume the problem is that RCU thinks we're in idle until
> the entering_irq(), but the MSR write with the new trace point
> is already in the ack_APIC_irq()

Just confirming, yes, RCU thinks we are in idle until entering_irq()
gets to the call to rcu_irq_enter().

Thanx, Paul

> Thomas, can the two call be reordered? I assume it's safe.
>
> -Andi
>
> --
> ak@xxxxxxxxxxxxxxx -- Speaking for myself only
>