Re: [PATCH] rcu: Report a quiescent state when it's exactly in the state

From: Paul E. McKenney
Date: Sat May 12 2018 - 01:07:05 EST


On Fri, May 11, 2018 at 03:41:38PM -0700, Joel Fernandes wrote:
> On Fri, May 11, 2018 at 09:17:46AM -0700, Paul E. McKenney wrote:
> > On Fri, May 11, 2018 at 09:57:54PM +0900, Byungchul Park wrote:
> > > Hello folks,
> > >
> > > I think I wrote the title in a misleading way.
> > >
> > > Please change the title to something else such as,
> > > "rcu: Report a quiescent state when it's in the state" or,
> > > "rcu: Add points reporting quiescent states where proper" or so on.
> > >
> > > On 2018-05-11 ìí 5:30, Byungchul Park wrote:
> > > >We expect a quiescent state of TASKS_RCU when cond_resched_tasks_rcu_qs()
> > > >is called, no matter whether it actually be scheduled or not. However,
> > > >it currently doesn't report the quiescent state when the task enters
> > > >into __schedule() as it's called with preempt = true. So make it report
> > > >the quiescent state unconditionally when cond_resched_tasks_rcu_qs() is
> > > >called.
> > > >
> > > >And in TINY_RCU, even though the quiescent state of rcu_bh also should
> > > >be reported when the tick interrupt comes from user, it doesn't. So make
> > > >it reported.
> > > >
> > > >Lastly in TREE_RCU, rcu_note_voluntary_context_switch() should be
> > > >reported when the tick interrupt comes from not only user but also idle,
> > > >as an extended quiescent state.
> > > >
> > > >Signed-off-by: Byungchul Park <byungchul.park@xxxxxxx>
> > > >---
> > > > include/linux/rcupdate.h | 4 ++--
> > > > kernel/rcu/tiny.c | 6 +++---
> > > > kernel/rcu/tree.c | 4 ++--
> > > > 3 files changed, 7 insertions(+), 7 deletions(-)
> > > >
> > > >diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
> > > >index ee8cf5fc..7432261 100644
> > > >--- a/include/linux/rcupdate.h
> > > >+++ b/include/linux/rcupdate.h
> > > >@@ -195,8 +195,8 @@ static inline void exit_tasks_rcu_finish(void) { }
> > > > */
> > > > #define cond_resched_tasks_rcu_qs() \
> > > > do { \
> > > >- if (!cond_resched()) \
> > > >- rcu_note_voluntary_context_switch_lite(current); \
> > > >+ rcu_note_voluntary_context_switch_lite(current); \
> > > >+ cond_resched(); \
> >
> > Ah, good point.
> >
> > Peter, I have to ask... Why is "cond_resched()" considered a preemption
> > while "schedule()" is not?
>
> Infact something interesting I inferred from the __schedule loop related to
> your question:
>
> switch_count can either be set to prev->invcsw or prev->nvcsw. If we can
> assume that switch_count reflects whether the context switch is involuntary
> or voluntary,
>
> task-running-state preempt switch_count
> 0 (running) 1 involuntary
> 0 0 involuntary
> 1 0 voluntary
> 1 1 involuntary
>
> According to the above table, both the task's running state and the preempt
> parameter to __schedule should be used together to determine if the switch is
> a voluntary one or not.
>
> So this code in rcu_note_context_switch should really be:
> if (!preempt && !(current->state & TASK_RUNNING))
> rcu_note_voluntary_context_switch_lite(current);
>
> According to the above table, cond_resched always classifies as an
> involuntary switch which makes sense to me. Even though cond_resched is
> explicitly called, its still sort of involuntary in the sense its not called
> into the scheduler for sleeping, but rather for seeing if something else can
> run instead (a preemption point). Infact none of the task deactivation in the
> __schedule loop will run if cond_resched is used.
>
> I agree that if schedule was called directly but with TASK_RUNNING=1, then
> that could probably be classified an involuntary switch too...
>
> Also since we're deciding to call rcu_note_voluntary_context_switch_lite
> unconditionally, then IMO this comment on that macro:
>
> /*
> * Note a voluntary context switch for RCU-tasks benefit. This is a
> * macro rather than an inline function to avoid #include hell.
> */
> #ifdef CONFIG_TASKS_RCU
> #define rcu_note_voluntary_context_switch_lite(t)
>
> Should be changed to:
>
> /*
> * Note a attempt to perform a voluntary context switch for RCU-tasks
> * benefit. This is called even in situations where a context switch
> * didn't really happen even though it was requested. This is a
> * macro rather than an inline function to avoid #include hell.
> */
> #ifdef CONFIG_TASKS_RCU
> #define rcu_note_voluntary_context_switch_lite(t)
>
> Right?
>
> Correct me if I'm wrong about anything, thanks,

The starting point for me is that Tasks RCU is a special-purpose mechanism
for freeing trampolines in PREEMPT=y kernels. The approach is to arrange
for the trampoline to be inaccessible to future execution, wait for a
tasks-RCU grace period, then free the trampoline. So a tasks-RCU grace
period must wait until all tasks have spent at least some time outside
of a trampoline. My understanding is that trampolines cannot contain
preemption points, such as cond_resched() and cond_resched_tasks_rcu_qs(),
so we want to count them as quiescent states regardless of whether or
not any associated context switch is counted as involuntary.

What situations lead to the second line of your table above?
The sched_yield() system call, but trampolines don't do system calls,
either, as far as I know.

So it looks to me like that test can leave out the TASK_RUNNING check.

Or am I missing something subtle?

Thanx, Paul