Re: [PATCH RFC tip/core/rcu 09/16] rcu-tasks: Add an RCU-tasks rude variant

From: Paul E. McKenney
Date: Mon Mar 16 2020 - 16:29:44 EST


On Mon, Mar 16, 2020 at 03:47:54PM -0400, Joel Fernandes wrote:
> On Thu, Mar 12, 2020 at 11:16:55AM -0700, paulmck@xxxxxxxxxx wrote:
> > From: "Paul E. McKenney" <paulmck@xxxxxxxxxx>
> >
> > This commit adds a "rude" variant of RCU-tasks that has as quiescent
> > states schedule(), cond_resched_tasks_rcu_qs(), userspace execution,
> > and (in theory, anyway) cond_resched(). Updates make use of IPIs and
> > force an IPI and a context switch on each online CPU. This variant
> > is useful in some situations in tracing.
>
> Would it be possible to better clarify that the "rude version" works only
> from preempt-disabled regions? Is that also true for the "non-rude" version?

The rude version's read-side critical sections are preempt-disabled
regions.

The prior non-rude version's critical sections are not limited to
preempt-disabled regions, but instead extend between voluntary context
switches, as described in the header comment for synchronize_rcu_tasks().

> Also it would be good to clarify better in cover letter, how these new
> flavors relate to the existing Tasks-RCU implementation.

The cover letter did that for the tracing variant (shown below), but I
can add this information for the rude variant on the next round.

The tracing variant has explicit read-side markers to permit
finite grace periods even given in-kernel loops in PREEMPT=n
builds It also protects code in the idle loop, on exception
entry/exit paths, and on the various CPU-hotplug online/offline
code paths, thus having protection properties similar to SRCU.
However, unlike SRCU, this variant avoids expensive instructions
in the read-side primitives, thus having read-side overhead
similar to that of preemptible RCU.

> In the existing one, a quiescent state is a task updating its context switch
> counters such that it went to sleep at least once, implying there is no
> chance it is on an about to be destroyed trampoline.

Yes, voluntary context switch.

> However, here we are trying to determine if a task state is no longer on an
> RQ (which I gleaned from the first patch). Sounds very similar, would the
> context switch counters not help in that determination as well? If it is Ok,
> it would be good to describe in cover letter about what is exactly is a
> quiescent state and what exactly is a reader section in the cover letter, for
> both non-rude and rude version. Thanks!

No, for both of the new variants, a task can be in a quiescent state
while still being on a runqueue. The rude variant can be preempted and
the tracing variant can be outside of its explicitly marked read-side
critical sections.

The existing non-rude version's quiescent states are already listed in
the docbook header comment for synchronize_rcu_tasks().

Thanx, Paul

> thanks,
>
> - Joel
>
>
>
> >
> > Suggested-by: Steven Rostedt <rostedt@xxxxxxxxxxx>
> > Signed-off-by: Paul E. McKenney <paulmck@xxxxxxxxxx>
> > ---
> > include/linux/rcupdate.h | 3 ++
> > kernel/rcu/Kconfig | 12 +++++-
> > kernel/rcu/tasks.h | 99 ++++++++++++++++++++++++++++++++++++++++++++++++
> > 3 files changed, 113 insertions(+), 1 deletion(-)
> >
> > diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
> > index 5523145..2be97a8 100644
> > --- a/include/linux/rcupdate.h
> > +++ b/include/linux/rcupdate.h
> > @@ -37,6 +37,7 @@
> > /* Exported common interfaces */
> > void call_rcu(struct rcu_head *head, rcu_callback_t func);
> > void rcu_barrier_tasks(void);
> > +void rcu_barrier_tasks_rude(void);
> > void synchronize_rcu(void);
> >
> > #ifdef CONFIG_PREEMPT_RCU
> > @@ -138,6 +139,8 @@ static inline void rcu_init_nohz(void) { }
> > #define rcu_note_voluntary_context_switch(t) rcu_tasks_qs(t)
> > void call_rcu_tasks(struct rcu_head *head, rcu_callback_t func);
> > void synchronize_rcu_tasks(void);
> > +void call_rcu_tasks_rude(struct rcu_head *head, rcu_callback_t func);
> > +void synchronize_rcu_tasks_rude(void);
> > void exit_tasks_rcu_start(void);
> > void exit_tasks_rcu_finish(void);
> > #else /* #ifdef CONFIG_TASKS_RCU_GENERIC */
> > diff --git a/kernel/rcu/Kconfig b/kernel/rcu/Kconfig
> > index 38475d0..0d43ec1 100644
> > --- a/kernel/rcu/Kconfig
> > +++ b/kernel/rcu/Kconfig
> > @@ -71,7 +71,7 @@ config TREE_SRCU
> > This option selects the full-fledged version of SRCU.
> >
> > config TASKS_RCU_GENERIC
> > - def_bool TASKS_RCU
> > + def_bool TASKS_RCU || TASKS_RUDE_RCU
> > select SRCU
> > help
> > This option enables generic infrastructure code supporting
> > @@ -84,6 +84,16 @@ config TASKS_RCU
> > only voluntary context switch (not preemption!), idle, and
> > user-mode execution as quiescent states. Not for manual selection.
> >
> > +config TASKS_RUDE_RCU
> > + def_bool 0
> > + default n
> > + help
> > + This option enables a task-based RCU implementation that uses
> > + only context switch (including preemption) and user-mode
> > + execution as quiescent states. It forces IPIs and context
> > + switches on all online CPUs, including idle ones, so use
> > + with caution. Not for manual selection.
> > +
> > config RCU_STALL_COMMON
> > def_bool TREE_RCU
> > help
> > diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h
> > index d77921e..1d25c50 100644
> > --- a/kernel/rcu/tasks.h
> > +++ b/kernel/rcu/tasks.h
> > @@ -180,6 +180,9 @@ static void __init rcu_tasks_bootup_oddness(void)
> > else
> > pr_info("\tTasks RCU enabled.\n");
> > #endif /* #ifdef CONFIG_TASKS_RCU */
> > +#ifdef CONFIG_TASKS_RUDE_RCU
> > + pr_info("\tRude variant of Tasks RCU enabled.\n");
> > +#endif /* #ifdef CONFIG_TASKS_RUDE_RCU */
> > }
> >
> > #endif /* #ifndef CONFIG_TINY_RCU */
> > @@ -410,3 +413,99 @@ static int __init rcu_spawn_tasks_kthread(void)
> > core_initcall(rcu_spawn_tasks_kthread);
> >
> > #endif /* #ifdef CONFIG_TASKS_RCU */
> > +
> > +#ifdef CONFIG_TASKS_RUDE_RCU
> > +
> > +////////////////////////////////////////////////////////////////////////
> > +//
> > +// "Rude" variant of Tasks RCU, inspired by Steve Rostedt's trick of
> > +// passing an empty function to schedule_on_each_cpu(). This approach
> > +// provides an asynchronous call_rcu_rude() API and batching of concurrent
> > +// calls to the synchronous synchronize_rcu_rude() API. This sends IPIs
> > +// far and wide and induces otherwise unnecessary context switches on all
> > +// online CPUs, whether online or not.
> > +
> > +// Empty function to allow workqueues to force a context switch.
> > +static void rcu_tasks_be_rude(struct work_struct *work)
> > +{
> > +}
> > +
> > +// Wait for one rude RCU-tasks grace period.
> > +static void rcu_tasks_rude_wait_gp(struct rcu_tasks *rtp)
> > +{
> > + schedule_on_each_cpu(rcu_tasks_be_rude);
> > +}
> > +EXPORT_SYMBOL_GPL(rcu_tasks_rude_wait_gp);
> > +
> > +void call_rcu_tasks_rude(struct rcu_head *rhp, rcu_callback_t func);
> > +DEFINE_RCU_TASKS(rcu_tasks_rude, rcu_tasks_rude_wait_gp, call_rcu_tasks_rude);
> > +
> > +/**
> > + * call_rcu_tasks_rude() - Queue a callback rude task-based grace period
> > + * @rhp: structure to be used for queueing the RCU updates.
> > + * @func: actual callback function to be invoked after the grace period
> > + *
> > + * The callback function will be invoked some time after a full grace
> > + * period elapses, in other words after all currently executing RCU
> > + * read-side critical sections have completed. call_rcu_tasks_rude()
> > + * assumes that the read-side critical sections end at context switch,
> > + * cond_resched_rcu_qs(), or transition to usermode execution. As such,
> > + * there are no read-side primitives analogous to rcu_read_lock() and
> > + * rcu_read_unlock() because this primitive is intended to determine
> > + * that all tasks have passed through a safe state, not so much for
> > + * data-strcuture synchronization.
> > + *
> > + * See the description of call_rcu() for more detailed information on
> > + * memory ordering guarantees.
> > + */
> > +void call_rcu_tasks_rude(struct rcu_head *rhp, rcu_callback_t func)
> > +{
> > + call_rcu_tasks_generic(rhp, func, &rcu_tasks_rude);
> > +}
> > +EXPORT_SYMBOL_GPL(call_rcu_tasks_rude);
> > +
> > +/**
> > + * synchronize_rcu_tasks_rude - wait for a rude rcu-tasks grace period
> > + *
> > + * Control will return to the caller some time after a rude rcu-tasks
> > + * grace period has elapsed, in other words after all currently
> > + * executing rcu-tasks read-side critical sections have elapsed. These
> > + * read-side critical sections are delimited by calls to schedule(),
> > + * cond_resched_tasks_rcu_qs(), userspace execution, and (in theory,
> > + * anyway) cond_resched().
> > + *
> > + * This is a very specialized primitive, intended only for a few uses in
> > + * tracing and other situations requiring manipulation of function preambles
> > + * and profiling hooks. The synchronize_rcu_tasks_rude() function is not
> > + * (yet) intended for heavy use from multiple CPUs.
> > + *
> > + * See the description of synchronize_rcu() for more detailed information
> > + * on memory ordering guarantees.
> > + */
> > +void synchronize_rcu_tasks_rude(void)
> > +{
> > + synchronize_rcu_tasks_generic(&rcu_tasks_rude);
> > +}
> > +EXPORT_SYMBOL_GPL(synchronize_rcu_tasks_rude);
> > +
> > +/**
> > + * rcu_barrier_tasks_rude - Wait for in-flight call_rcu_tasks_rude() callbacks.
> > + *
> > + * Although the current implementation is guaranteed to wait, it is not
> > + * obligated to, for example, if there are no pending callbacks.
> > + */
> > +void rcu_barrier_tasks_rude(void)
> > +{
> > + /* There is only one callback queue, so this is easy. ;-) */
> > + synchronize_rcu_tasks_rude();
> > +}
> > +EXPORT_SYMBOL_GPL(rcu_barrier_tasks_rude);
> > +
> > +static int __init rcu_spawn_tasks_rude_kthread(void)
> > +{
> > + rcu_spawn_tasks_kthread_generic(&rcu_tasks_rude);
> > + return 0;
> > +}
> > +core_initcall(rcu_spawn_tasks_rude_kthread);
> > +
> > +#endif /* #ifdef CONFIG_TASKS_RUDE_RCU */
> > --
> > 2.9.5
> >