Re: Use case for TASKS_RCU

From: Paul E. McKenney
Date: Tue May 23 2017 - 17:10:26 EST


On Tue, May 23, 2017 at 04:38:53PM -0400, Steven Rostedt wrote:
> On Tue, 23 May 2017 13:00:35 -0700
> "Paul E. McKenney" <paulmck@xxxxxxxxxxxxxxxxxx> wrote:
>
>
> > > > Unfortunately, it does not work, as I should have known ahead of
> > > > time from the dyntick-idle experience. Not all context switches
> > > > go through context_switch(). :-/
> > >
> > > Wait. What context switch doesn't go through a context switch? Or do
> > > you mean a user/kernel context switch?
> >
> > I mean that putting printk() before and after the call to
> > context_switch() can show tasks switching out twice without switching
> > in and vice versa. No sign of lost printk()s, and I also confirmed
> > this behavior using a flag in task_struct.
>
> I hope you meant trace_printk()s' as printk is a huge overhead and can
> cause side effects.

Not so much during boot. But actually, I meant to ask you about that...

>From what I can see from the ftrace documentation, booting with something
like this:

ftrace=function ftrace_filter=tasks_rcu_qs,tasks_rcu_qs_enter,tasks_rcu_qs_exit

Should enable ftrace, but only on the three functions called out.
But when I try this, I get the following in dmesg:

[ 1.506171] ftrace bootup tracer 'function' not registered

And I don't get anything from ftrace_dump() later on.

What am I doing wrong here? (Event tracing has worked for me in the
past from the boot line, but I was lazy so just fell back to printk().
And I didn't think of trace_printk().)

> > One way that this can happen on some architectures is via the "helper"
> > mechanism, where the task sleeps normally, but where a later interrupt
> > or exception takes on its context "behind the scenes" in the arch
> > code. This is what messed up my attempt to use a simple
> > interrupt-nesting counter for RCU dynticks some years back. What I
> > counted on there was that the idle loop would never do that sort of
> > thing, so I could zero the count when entering idle from process
> > context.
> >
> > But I have not yet found a similar trick for counting voluntary
> > context switches.
> >
> > I also tried making context_switch() look like a momentary quiescent
> > state, but of course that means that tasks that block forever also
> > block the grace period forever. At which point, I need to scan the
> > task list to find them. And that pretty much brings me back to the
> > current RCU-tasks implementation. :-/
>
> Nothing should block in a preempted state forever, and if it does, that
> means we want to wait forever. Because it could be preempted on the
> trampoline.

Blocking in a preempted state is not the problem here. Given that
the obvious hooks don't seem to be catching all of the switch-to and
switch-from events, blocking forever in a not-preempted state is
the problem. I either need some way to see all of the switch-from
and switch-to events (and the ways I can see to do this have patch-size
and maintainability issues), or I need to go back to scanning the
task list.

And of course, all of the approaches that update state upon context
switch are slowing down a fastpath for the benefit of a slowpath,
which is not necessarily all that good of a thing.

Thanx, Paul