Re: [PATCH] tracing: add trace_event_read_lock()

From: Paul E. McKenney
Date: Mon May 25 2009 - 18:10:11 EST


On Wed, May 20, 2009 at 02:59:50AM +0200, Frederic Weisbecker wrote:
> On Tue, May 19, 2009 at 05:38:53AM -0700, Paul E. McKenney wrote:
> > On Tue, May 19, 2009 at 01:15:44PM +0800, Lai Jiangshan wrote:
> > > Paul E. McKenney wrote:
> > > > On Mon, May 18, 2009 at 03:59:31PM +0200, Frederic Weisbecker wrote:
> > > >> On Mon, May 18, 2009 at 07:35:34PM +0800, Lai Jiangshan wrote:
> > > >>> I found that there is nothing to protect event_hash in
> > > >>> ftrace_find_event().
> > > >> Actually, rcu protects it, but not enough. We have neither
> > > >> synchronize_rcu() nor rcu_read_lock.
> > > >>
> > > >> So we protect against concurrent hlist accesses.
> > > >> But the event can be removed when a module is unloaded,
> > > >> and that can happen between the time we get the event output
> > > >> callback and the time we actually use it.
> > > >
> > > > I will ask the stupid question... Would invoking rcu_barrier() in the
> > > > module-exit function take care of this? The rcu_barrier() primitive
> > > > waits for all in-flight RCU callbacks to complete execution.
> > > >
> > > > Thanx, Paul
> > > >
> > >
> > > We have no call_rcu* in it.
> >
> > OK, I will ask the next stupid question... Is it possible to use the
> > same trick rcu_barrier() uses, but for your events?
> >
> > Thanx, Paul
>
>
> Hi Paul,
>
> I think you're right, we could use RCU to solve this race.
>
> The current race window to solve is as follows:
>
>
> --Task A--
>
> print_trace_line(trace) {
> event = find_ftrace_event(trace) {
> hlist_for_each_entry_rcu(....) {
> ...
> return found;
> }
> }
>
>
> --Task B--
>
> /*
> * Can be quite confusing.
> * But we have "trace events" which are global objects
> * handling tracing, output, everything.
> * And we have also "ftrace events" which are unique
> * identifiers on traces than make them able to retrieve
> * their associated output callbacks.
> * So a "trace event" has an associated "ftrace event" to output
> * itself.
> */
>
> trace_module_remove_events(mod) {
> list_trace_events_module(ev, mod) {
> unregister_ftrace_event(ev->event) {
> // mutex protected
> hlist_del(ev->event->node)
> list_del(....)
> //
> }
> }
> }
> //module removed

Ouch!!! ;-)

> -- Task A--
>
> event->print(trace)
>
> KABOOM! print callback was on the module. event doesn't even exist anymore.
>
> So Lai's solution is correct to fix it.
> But rcu is also valid.
>
> Currently rcu only protects the hashlist but not the callback until
> the job is complete.
>
> What we could do is:
>
> (Read side)
>
> rcu_read_lock();
>
> event = find_ftrace_event();
> event->print(trace);

As long as event->print() never blocks...

Though you could use SRCU if it might block:

rcu_read_lock() -> i = srcu_read_lock(&my_srcu_struct);
rcu_read_unlock() -> srcu_read_unlock(&my_srcu_struct, i);
synchronize_rcu() -> synchronize_srcu(&my_srcu_struct);

> rcu_read_unlock()
>
> (Write side)
>
> unregister_ftrace_event(e)
> {
> mutex_lock(&writer);
> hlist_del(e);
> list_del(e);
> mutex_unlock(&writer);
>
> // Wait for every printing grace periods
> synchronize_rcu();
> }
>
>
> Then the only overhead is a preempt_disable()/preempt_enable()
> for each trace to be printed. Just a sub/add pair and quite few
> preemption disabled latency, just the time for a hashlist search
> and some work done to format a trace to be printed.
> But if latency does really matters, we have RCU_PREEMPT.
>
> On the other side, the down_read() is an extra call but it is called
> only once for a whole loop of traces seeking.
>
> I guess both solutions are valid.
> What do you think?

It could go either way. As a very rough rule of thumb, if the down_read()
happens seldom, there is little advantage to RCU. On the other hand,
if the down_read() could happen often enough that a new reader shows up
before the previous down_read() completes, then you really need RCU or
something like it.

On most systems, if down_read() was happening more than a few million
times per second, life might be hard. On the other hand, if down_read()
never happened more than a few tens of thousand times per second or so,
should be no problem.

Since it sounds like a single down_read() substitutes for a potentially
large number of RCU read-side critical sections, I do not believe that
contention-free read-side overhead would be a problem.

Does that help?

Thanx, Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/