Re: [PATCH v2 2/2] tracing/rcu: don't trace rcu_callback on offline CPUs

From: Paul E. McKenney
Date: Mon Feb 15 2016 - 19:48:43 EST


On Sun, Feb 14, 2016 at 10:05:22PM -0800, Paul E. McKenney wrote:
> On Sun, Feb 14, 2016 at 09:50:18AM +0300, Denis Kirjanov wrote:
> > Tracepoints use RCU for protection and they must not be called on
> > offline CPUS. So make this tracepoint conditional.
>
> Good catch! Queued for review and testing.

And I dequeued this in favor of Steven's more recent patch.

But there is one other hitch... It is currently not legal to invoke
call_rcu() from an offline CPU. You can get away with it during a
short window towards the end of the offline process, but shortly after
the outgoing CPU hits the idle loop, call_rcu() will splat and leak
the callback.

So what exactly is the purpose of invoking call_rcu() from an offline CPU?
(Yes, I could probably make it work, but there needs to be a good reason.)

Thanx, Paul

> > NFO: suspicious RCU usage. ]
> > [ 413.344670] 4.4.0-00006-g0fe53e8-dirty #33 Tainted: G S
> > [ 413.344672] -------------------------------
> > [ 413.344673] include/trace/events/rcu.h:457 suspicious rcu_dereference_check() usage!
> > [ 413.344674]
> > other info that might help us debug this:
> >
> > [ 413.344676]
> > RCU used illegally from offline CPU!
> > rcu_scheduler_active = 1, debug_locks = 1
> > [ 413.344678] no locks held by swapper/4/0.
> > [ 413.344679]
> > stack backtrace:
> > [ 413.344682] CPU: 4 PID: 0 Comm: swapper/4 Tainted: G S 4.4.0-00006-g0fe53e8-dirty #33
> > [ 413.344683] Call Trace:
> > [ 413.344692] [c0000005b76b7800] [c0000000008bb080] .dump_stack+0x98/0xd4 (unreliable)
> > [ 413.344698] [c0000005b76b7880] [c00000000010c8b8] .lockdep_rcu_suspicious+0x108/0x170
> > [ 413.344703] [c0000005b76b7910] [c00000000013b9e4] .__call_rcu.constprop.60+0x264/0x600
> > [ 413.344708] [c0000005b76b79e0] [c0000000002bceec] .put_object+0x5c/0x80
> > [ 413.344712] [c0000005b76b7a60] [c00000000029a368] .kmem_cache_free+0x298/0x450
> > [ 413.344716] [c0000005b76b7b00] [c000000000093494] .__mmdrop+0x54/0x150
> > [ 413.344720] [c0000005b76b7b90] [c0000000000e4010] .idle_task_exit+0x130/0x140
> > [ 413.344725] [c0000005b76b7c20] [c000000000075804] .pseries_mach_cpu_die+0x64/0x310
> > [ 413.344730] [c0000005b76b7cd0] [c000000000043e7c] .cpu_die+0x3c/0x60
> > [ 413.344734] [c0000005b76b7d40] [c0000000000188d8] .arch_cpu_idle_dead+0x28/0x40
> > [ 413.344738] [c0000005b76b7db0] [c000000000101e8c] .cpu_startup_entry+0x50c/0x560
> > [ 413.344741] [c0000005b76b7ed0] [c000000000043bd8] .start_secondary+0x328/0x360
> > [ 413.344746] [c0000005b76b7f90] [c000000000008a6c] start_secondary_prolog+0x10/0x14
> >
> > Signed-off-by: Denis Kirjanov <kda@xxxxxxxxxxxxxxxxx>
> > ---
> >
> > v2: Fix the build error that was made
> > while sending the pacthes from another machine
> >
> > include/trace/events/rcu.h | 4 +++-
> > 1 file changed, 3 insertions(+), 1 deletion(-)
> >
> > diff --git a/include/trace/events/rcu.h b/include/trace/events/rcu.h
> > index ef72c4a..793d306b 100644
> > --- a/include/trace/events/rcu.h
> > +++ b/include/trace/events/rcu.h
> > @@ -428,13 +428,15 @@ TRACE_EVENT(rcu_prep_idle,
> > * number of lazy callbacks queued, and the fourth element is the
> > * total number of callbacks queued.
> > */
> > -TRACE_EVENT(rcu_callback,
> > +TRACE_EVENT_CONDITION(rcu_callback,
> >
> > TP_PROTO(const char *rcuname, struct rcu_head *rhp, long qlen_lazy,
> > long qlen),
> >
> > TP_ARGS(rcuname, rhp, qlen_lazy, qlen),
> >
> > + TP_CONDITION(cpu_online(raw_smp_processor_id())),
> > +
> > TP_STRUCT__entry(
> > __field(const char *, rcuname)
> > __field(void *, rhp)
> > --
> > 2.4.0
> >