Re: [PATCH] nohz: Remove tick_nohz_idle_enter_norcu() /tick_nohz_idle_exit_norcu()

From: Frederic Weisbecker
Date: Mon Nov 21 2011 - 10:23:59 EST


On Sun, Nov 20, 2011 at 09:28:19PM -0800, Paul E. McKenney wrote:
> On Mon, Nov 21, 2011 at 02:46:58AM +0100, Frederic Weisbecker wrote:
> > 2011/11/19 Paul E. McKenney <paulmck@xxxxxxxxxxxxxxxxxx>:
> > > On Thu, Nov 17, 2011 at 05:03:44PM -0800, Paul E. McKenney wrote:
> > >> On Thu, Nov 17, 2011 at 12:11:34PM -0800, Josh Triplett wrote:
> > >> > On Thu, Nov 17, 2011 at 06:48:14PM +0100, Frederic Weisbecker wrote:
> > >> > > Those two APIs were provided to optimize the calls of
> > >> > > tick_nohz_idle_enter() and rcu_idle_enter() into a single
> > >> > > irq disabled section. This way no interrupt happening in-between would
> > >> > > needlessly process any RCU job.
> > >> > >
> > >> > > Now we are talking about an optimization for which benefits
> > >> > > have yet to be measured. Let's start simple and completely decouple
> > >> > > idle rcu and dyntick idle logics to simplify.
> > >> > >
> > >> > > Signed-off-by: Frederic Weisbecker <fweisbec@xxxxxxxxx>
> > >> > > Cc: Ingo Molnar <mingo@xxxxxxxxxx>
> > >> > > Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
> > >> > > Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
> > >> > > Cc: Josh Triplett <josh@xxxxxxxxxxxxxxxx>
> > >> >
> > >> > Reviewed-by: Josh Triplett <josh@xxxxxxxxxxxxxxxx>
> > >>
> > >> Merged, thank you both!
> > >
> > > And here is a patch on top of yours to allow nesting of rcu_idle_enter()
> > > and rcu_idle_exit().  Thoughts?
> > >
> > >                                                        Thanx, Paul
> > >
> > > ------------------------------------------------------------------------
> > >
> > > rcu: Allow nesting of rcu_idle_enter() and rcu_idle_exit()
> > >
> > > Running user tasks in dyntick-idle mode requires RCU to undergo
> > > an idle-to-non-idle transition on each entry into the kernel, and
> > > vice versa on each exit from the kernel.  However, situations where
> > > user tasks cannot run in dyntick-idle mode (for example, when there
> > > is more than one runnable task on the CPU in question) also require
> > > RCU to undergo an idle-to-non-idle transition when coming out of the
> > > idle loop (and vice versa when entering the idle loop).
> >
> > Not sure what you mean about the idle loop with the dyntick-idle mode we
> > can't enter when we resume to userspace with more than one task in the runqueue.
> >
> > >  In this case,
> > > RCU would see one idle-to-non-idle transition when the task became
> > > runnable, and another when the task executed a system call.
> >
> > I'm a bit confused with this changelog.
> >
> > What can happen with the adaptive tickless thing is:
> >
> > - When we resume to userspace after a syscall/irq/exception and we are
> > not in RCU extended quiescent state, then switch to it. We may call it RCU
> > idle mode I guess but that may start to be confusing.
> > So this may involve several kind of nesting. From a single rcu_idle_enter()
> > to more complicated scenario if we switch to RCU extended qs from an
> > an interrupt: rcu_idle_exit() is called on entry of the irq, rcu_idle_enter() is
> > called in the middle then finally a last call to rcu_idle_enter() in the irq
> > exit at which point only we want the RCU extended qs to be effective.
> >
> > - We may also exit that RCU extended qs state by involving other funny
> > nesting. We have the simple syscall enter that just calls rcu_idle_exit() if
> > we were in userspace in RCU extended qs.
>
> OK, so perhaps this is what I am missing. Do you avoid calling
> rcu_idle_exit() in the case where the user-mode execution was not an
> RCU extended quiescent state? If so, then my patch is not needed,
> and I can revert it.

Yes, if we resume to userspace after a syscall but we have more than one
task in the runqueue, then we don't switch to RCU extended qs: we don't
call rcu_idle_exit() on syscall return in this case.

>
> > We may also receive an IPI
> > that enqueues a new task, in which case we may exit the RCU extended
> > quiescent from the irq with the following nesting:
> > rcu_idle_exit() on irq entry, then another call to rcu_idle_exit() to prevent
> > from resuming the RCU extended quiescent state when we come back
> > to userspace and finally the rcu_idle_enter() in the irq exit.
> >
> > Is that what you had in mind?
>
> I was concerned about the following scenario:
>
> 1. A CPU is initially idle.
>
> 2. Task A wakes up on that CPU, enters user-mode execution
> in an RCU extended quiescent state.

Just in case, I would like to note what happens in detail here:

- Idle notices the need to resched, goes out of its idle loop and
calls rcu_idle_exit(). The scheduler context switching may need
RCU and we don't know where the next task will resume. If it's in
the kernel it may need RCU as well. So we need this unconditional
rcu_idle_exit() that re-enables RCU.

- We also re-enable the tick unconditionally on idle exit time. So
when the user task resumes, the tick is there and may decide to shut
down again, in which case we may call rcu_idle_enter() if we are in
userspace. Otherwise this is done later when we resume userspace (syscall
or exception).

>
> 3. Task B wakes up on that CPU, forcing the CPU out of its
> RCU extended quiescent state. However, Task A is higher
> priority than is Task B, so Task A continues running.

Right but we have two tasks in the runqueue then, so we restart
the tick and call rcu_idle_exit().

>
> 4. Task A invokes a system call. If the system-call entry
> code were to again invoke rcu_idle_enter(), then my patch
> is required. If you check and avoid invoking rcu_idle_enter()
> in this case, then my patch is not required.

You mean rcu_idle_exit()? So yeah, since we have the tick running
and thus RCU not in extended QS, we won't call rcu_idle_exit() on syscall
entry.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/