Re: call_rcu from trace_preempt

From: Paul E. McKenney
Date: Mon Jun 15 2015 - 22:15:11 EST

Next message: Theodore Ts'o: "[PATCH] builddeb: sign the modules after splitting out the debuginfo files"
Previous message: Jiang Liu: "Re: [lkp] [x86/MSI] 52f518a3a7c: -30.5% netperf.Throughput_tps"
In reply to: Alexei Starovoitov: "Re: call_rcu from trace_preempt"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Mon, Jun 15, 2015 at 06:09:56PM -0700, Alexei Starovoitov wrote:
> On 6/15/15 4:07 PM, Paul E. McKenney wrote:
> >
> >Oh... One important thing is that both call_rcu() and kfree_rcu()
> >use per-CPU variables, managing a per-CPU linked list. This is why
> >they disable interrupts. If you do another call_rcu() in the middle
> >of the first one in just the wrong place, you will have two entities
> >concurrently manipulating the same linked list, which will not go well.
>
> yes. I'm trying to find that 'wrong place'.
> The trace.patch is doing kmalloc/kfree_rcu for every preempt_enable.
> So any spin_unlock called by first call_rcu will be triggering
> 2nd recursive to call_rcu.
> But as far as I could understand rcu code that looks ok everywhere.
> call_rcu
> debug_rcu_head_[un]queue
> debug_object_activate
> spin_unlock
>
> and debug_rcu_head* seems to be called from safe places
> where local_irq is enabled.

I do sympathize, but your own testing does demonstrate that it is
very much not OK. ;-)

> >Maybe mark call_rcu() and the things it calls as notrace? Or you
> >could maintain a separate per-CPU linked list that gathered up the
> >stuff to be kfree()ed after a grace period, and some time later
> >feed them to kfree_rcu()?
>
> yeah, I can think of this or 10 other ways to fix it within
> kprobe+bpf area, but I think something like call_rcu_notrace()
> may be a better solution.
> Or may be single generic 'fix' for call_rcu will be enough if
> it doesn't affect all other users.

Why do you believe that it is better to fix it within call_rcu()?

> >The usual consequence of racing a pair of callback insertions on the
> >same CPU would be that one of them gets leaked, and possible all
> >subsequent callbacks. So the lockup is no surprise. And there are a
> >lot of other assumptions in nearby code paths about only one execution
> >at a time from a given CPU.
>
> yes, I don't think calling 2nd call_rcu from preempt_enable violates
> this assumptions. local_irq does it job. No extra stuff is called when
> interrupts are disabled.

Perhaps you are self-deadlocking within __call_rcu_core(). If you have
not already done so, please try running with CONFIG_PROVE_LOCKING=y.
(Yes, I see your point about not calling extra stuff when interrupts
are disabled, and I remember that __call_rcu() avoids that path when
interrupts are disabled, but -something- is clearly going wrong!
Or maybe something momentarily enables interrupts somewhere, and RCU
has just been getting lucky. Or...)

> >>Any advise on where to look is greatly appreciated.
> >
> >What I don't understand is exactly what you are trying to do. Have more
> >complex tracers that dynamically allocate memory? If so, having a per-CPU
> >list that stages memory to be freed so that it can be passed to call_rcu()
> >in a safe environment might make sense. Of course, that list would need
> >to be managed carefully!
>
> yes. We tried to compute the time the kernel spends between
> preempt_disable->preempt_enable and plot a histogram of latencies.
>
> >Or am I missing the point of the code below?
>
> this trace.patch is reproducer of call_rcu crashes that doing:
> preempt_enable
> trace_preempt_on
> kfree_call_rcu
>
> The real call stack is:
> preempt_enable
> trace_preempt_on
> kprobe_int3_handler
> trace_call_bpf
> bpf_map_update_elem
> htab_map_update_elem
> kree_call_rcu

I suspect that your problem may range quite a bit further than just
call_rcu(). For example, in your stack trace, you have a recursive
call to debug_object_activate(), which might not be such good thing.

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Theodore Ts'o: "[PATCH] builddeb: sign the modules after splitting out the debuginfo files"
Previous message: Jiang Liu: "Re: [lkp] [x86/MSI] 52f518a3a7c: -30.5% netperf.Throughput_tps"
In reply to: Alexei Starovoitov: "Re: call_rcu from trace_preempt"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]