Re: call_rcu from trace_preempt

From: Alexei Starovoitov
Date: Mon Jun 15 2015 - 21:10:11 EST


On 6/15/15 4:07 PM, Paul E. McKenney wrote:

Oh... One important thing is that both call_rcu() and kfree_rcu()
use per-CPU variables, managing a per-CPU linked list. This is why
they disable interrupts. If you do another call_rcu() in the middle
of the first one in just the wrong place, you will have two entities
concurrently manipulating the same linked list, which will not go well.

yes. I'm trying to find that 'wrong place'.
The trace.patch is doing kmalloc/kfree_rcu for every preempt_enable.
So any spin_unlock called by first call_rcu will be triggering
2nd recursive to call_rcu.
But as far as I could understand rcu code that looks ok everywhere.
call_rcu
debug_rcu_head_[un]queue
debug_object_activate
spin_unlock

and debug_rcu_head* seems to be called from safe places
where local_irq is enabled.

Maybe mark call_rcu() and the things it calls as notrace? Or you
could maintain a separate per-CPU linked list that gathered up the
stuff to be kfree()ed after a grace period, and some time later
feed them to kfree_rcu()?

yeah, I can think of this or 10 other ways to fix it within
kprobe+bpf area, but I think something like call_rcu_notrace()
may be a better solution.
Or may be single generic 'fix' for call_rcu will be enough if
it doesn't affect all other users.

The usual consequence of racing a pair of callback insertions on the
same CPU would be that one of them gets leaked, and possible all
subsequent callbacks. So the lockup is no surprise. And there are a
lot of other assumptions in nearby code paths about only one execution
at a time from a given CPU.

yes, I don't think calling 2nd call_rcu from preempt_enable violates
this assumptions. local_irq does it job. No extra stuff is called when
interrupts are disabled.

Any advise on where to look is greatly appreciated.

What I don't understand is exactly what you are trying to do. Have more
complex tracers that dynamically allocate memory? If so, having a per-CPU
list that stages memory to be freed so that it can be passed to call_rcu()
in a safe environment might make sense. Of course, that list would need
to be managed carefully!

yes. We tried to compute the time the kernel spends between
preempt_disable->preempt_enable and plot a histogram of latencies.

Or am I missing the point of the code below?

this trace.patch is reproducer of call_rcu crashes that doing:
preempt_enable
trace_preempt_on
kfree_call_rcu

The real call stack is:
preempt_enable
trace_preempt_on
kprobe_int3_handler
trace_call_bpf
bpf_map_update_elem
htab_map_update_elem
kree_call_rcu

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/