Re: [PATCH 1/2] rcu: Show the real fqs_state

From: Paul E. McKenney
Date: Tue Sep 08 2015 - 15:59:26 EST


On Mon, Sep 07, 2015 at 04:58:27PM +0200, Petr Mladek wrote:
> On Fri 2015-09-04 16:24:22, Paul E. McKenney wrote:
> > On Fri, Sep 04, 2015 at 02:11:29PM +0200, Petr Mladek wrote:
> > > The value of "fqs_state" in struct rcu_state is always RCU_GP_IDLE.
> > >
> > > The real state is stored in a local variable in rcu_gp_kthread().
> > > It is modified by rcu_gp_fqs() via parameter and return value.
> > > But the actual value is never stored to rsp->fqs_state.
> > >
> > > The result is that print_one_rcu_state() does not show the real
> > > state.
> > >
> > > This code has been added 3 years ago by the commit 4cdfc175c25c89ee
> > > ("rcu: Move quiescent-state forcing into kthread"). I guess that it
> > > was an overlook or optimization.
> > >
> > > Anyway, the value seems to be manipulated only by the thread, except
> > > for shoving the status. I do not see any risk in updating it directly
> > > in the struct.
> > >
> > > Signed-off-by: Petr Mladek <pmladek@xxxxxxxx>
> >
> > Good catch, but how about the following fix instead?
> >
> > Thanx, Paul
> >
> > ------------------------------------------------------------------------
> >
> > rcu: Finish folding ->fqs_state into ->gp_state
> >
> > Commit commit 4cdfc175c25c89ee ("rcu: Move quiescent-state forcing
> > into kthread") started the process of folding the old ->fqs_state
> > into ->gp_state, but did not complete it. This situation does not
> > cause any malfunction, but can result in extremely confusing trace
> > output. This commit completes this task of eliminating ->fqs_state
> > in favor of ->gp_state.
>
> It makes sense but it breaks dynticks handling in rcu_gp_fqs(), see
> below.

Indeed, more confusion on my part!

> > Reported-by: Petr Mladek <pmladek@xxxxxxxx>
> > Signed-off-by: Paul E. McKenney <paulmck@xxxxxxxxxxxxxxxxxx>
> >
> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > index 69ab7ce2cf7b..04234936d897 100644
> > --- a/kernel/rcu/tree.c
> > +++ b/kernel/rcu/tree.c
> > @@ -1949,16 +1949,15 @@ static bool rcu_gp_fqs_check_wake(struct rcu_state *rsp, int *gfp)
> > /*
> > * Do one round of quiescent-state forcing.
> > */
> > -static int rcu_gp_fqs(struct rcu_state *rsp, int fqs_state_in)
> > +static void rcu_gp_fqs(struct rcu_state *rsp)
> > {
> > - int fqs_state = fqs_state_in;
> > bool isidle = false;
> > unsigned long maxj;
> > struct rcu_node *rnp = rcu_get_root(rsp);
> >
> > WRITE_ONCE(rsp->gp_activity, jiffies);
> > rsp->n_force_qs++;
> > - if (fqs_state == RCU_SAVE_DYNTICK) {
> > + if (rsp->gp_state == RCU_SAVE_DYNTICK) {
>
> This will never happen because rcu_gp_kthread() modifies rsp->gp_state
> many times. The last value before calling rcu_gp_fqs() is
> RCU_GP_DOING_FQS.
>
> I think about passing this information via a separate bool.
>
> [...]
>
> > diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
> > index d5f58e717c8b..9faad70a8246 100644
> > --- a/kernel/rcu/tree.h
> > +++ b/kernel/rcu/tree.h
> > @@ -417,12 +417,11 @@ struct rcu_data {
> > struct rcu_state *rsp;
> > };
> >
> > -/* Values for fqs_state field in struct rcu_state. */
> > +/* Values for gp_state field in struct rcu_state. */
> > #define RCU_GP_IDLE 0 /* No grace period in progress. */
>
> This value seems to be used instead of the new RCU_GP_WAIT_INIT.
>
> > #define RCU_GP_INIT 1 /* Grace period being
> > #initialized. */
>
> This value is unused.
>
> > #define RCU_SAVE_DYNTICK 2 /* Need to scan dyntick
> > #state. */
>
> This one is not longer preserved when merged with the other state.
>
> > #define RCU_FORCE_QS 3 /* Need to force quiescent
> > #state. */
>
> The meaning of this one is strange. If I get it correctly,
> it is set after the state was forced. But the comment suggests
> that it is before.
>
> By other words, these states seems to get obsoleted by
>
> /* Values for rcu_state structure's gp_flags field. */
> #define RCU_GP_WAIT_INIT 0 /* Initial state. */
> #define RCU_GP_WAIT_GPS 1 /* Wait for grace-period start. */
> #define RCU_GP_DONE_GPS 2 /* Wait done for grace-period start. */
> #define RCU_GP_WAIT_FQS 3 /* Wait for force-quiescent-state time. */
> #define RCU_GP_DOING_FQS 4 /* Wait done for force-quiescent-state time. */
> #define RCU_GP_CLEANUP 5 /* Grace-period cleanup started. */
> #define RCU_GP_CLEANED 6 /* Grace-period cleanup complete. */
>
>
> Please, find below your commit updated with my ideas:
>
> + used bool save_dyntick instead of RCU_SAVE_DYNTICK
> and RCU_FORCE_QS states
> + rename RCU_GP_WAIT_INIT -> RCU_GP_IDLE
> + remove all the obsolete states
>
> I am sorry if I handled "Signed-off-by" flags a wrong way. It is
> basically your patch with few small updates from me. I am not sure
> what is the right process in this case. Feel free to use Reviewed-by
> instead of Signed-off-by with my name.
>
> Well, I guess that this is not the final state ;-)

Good points, but perhaps an easier solution would be to have a
"firsttime" argument to rcu_gp_fqs() that said whether or not this
was the first call to rcu_gp_fqs() during the current grace period.
If this is the first call, then take the "if" branch that passes
dyntick_save_progress_counter() to force_qs_rnp(), otherwise take the
other branch.

An alternative approach would use the bottom bit of ->gp_state to
record whether or not the current grace period had done its first
call to rcu_gp_fqs().

But I am not generating the patch today, just flew across the Pacific
yesterday. ;-)

Thanx, Paul

> >From 61a1bf6659f4f4c0c4021f185bc156f8c83f9ea5 Mon Sep 17 00:00:00 2001
> From: "Paul E. McKenney" <paulmck@xxxxxxxxxxxxxxxxxx>
> Date: Fri, 4 Sep 2015 16:24:22 -0700
> Subject: [PATCH] rcu: Finish folding ->fqs_state into ->gp_state
>
> Commit commit 4cdfc175c25c89ee ("rcu: Move quiescent-state forcing
> into kthread") started the process of folding the old ->fqs_state
> into ->gp_state, but did not complete it. This situation does not
> cause any malfunction, but can result in extremely confusing trace
> output. This commit completes this task of eliminating ->fqs_state
> in favor of ->gp_state.
>
> The old fqs_state had one side effect. It was used to decide whether
> to collect dyntick-idle snapshots. For this purpose, we add a boolean
> into the state struct.
>
> Reported-by: Petr Mladek <pmladek@xxxxxxxx>
> Signed-off-by: Petr Mladek <pmladek@xxxxxxxx>
> ---
> kernel/rcu/tree.c | 17 +++++++----------
> kernel/rcu/tree.h | 16 +++++-----------
> kernel/rcu/tree_trace.c | 2 +-
> 3 files changed, 13 insertions(+), 22 deletions(-)
>
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 9f75f25cc5d9..f47067fdc783 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -98,7 +98,7 @@ struct rcu_state sname##_state = { \
> .level = { &sname##_state.node[0] }, \
> .rda = &sname##_data, \
> .call = cr, \
> - .fqs_state = RCU_GP_IDLE, \
> + .gp_state = RCU_GP_IDLE, \
> .gpnum = 0UL - 300UL, \
> .completed = 0UL - 300UL, \
> .orphan_lock = __RAW_SPIN_LOCK_UNLOCKED(&sname##_state.orphan_lock), \
> @@ -1927,16 +1927,15 @@ static bool rcu_gp_fqs_check_wake(struct rcu_state *rsp, int *gfp)
> /*
> * Do one round of quiescent-state forcing.
> */
> -static int rcu_gp_fqs(struct rcu_state *rsp, int fqs_state_in)
> +static void rcu_gp_fqs(struct rcu_state *rsp)
> {
> - int fqs_state = fqs_state_in;
> bool isidle = false;
> unsigned long maxj;
> struct rcu_node *rnp = rcu_get_root(rsp);
>
> WRITE_ONCE(rsp->gp_activity, jiffies);
> rsp->n_force_qs++;
> - if (fqs_state == RCU_SAVE_DYNTICK) {
> + if (rsp->save_dyntick) {
> /* Collect dyntick-idle snapshots. */
> if (is_sysidle_rcu_state(rsp)) {
> isidle = true;
> @@ -1945,7 +1944,7 @@ static int rcu_gp_fqs(struct rcu_state *rsp, int fqs_state_in)
> force_qs_rnp(rsp, dyntick_save_progress_counter,
> &isidle, &maxj);
> rcu_sysidle_report_gp(rsp, isidle, maxj);
> - fqs_state = RCU_FORCE_QS;
> + rsp->save_dyntick = false;
> } else {
> /* Handle dyntick-idle and offline CPUs. */
> isidle = true;
> @@ -1959,7 +1958,6 @@ static int rcu_gp_fqs(struct rcu_state *rsp, int fqs_state_in)
> READ_ONCE(rsp->gp_flags) & ~RCU_GP_FLAG_FQS);
> raw_spin_unlock_irq(&rnp->lock);
> }
> - return fqs_state;
> }
>
> /*
> @@ -2023,7 +2021,7 @@ static void rcu_gp_cleanup(struct rcu_state *rsp)
> /* Declare grace period done. */
> WRITE_ONCE(rsp->completed, rsp->gpnum);
> trace_rcu_grace_period(rsp->name, rsp->completed, TPS("end"));
> - rsp->fqs_state = RCU_GP_IDLE;
> + rsp->gp_state = RCU_GP_IDLE;
> rdp = this_cpu_ptr(rsp->rda);
> /* Advance CBs to reduce false positives below. */
> needgp = rcu_advance_cbs(rsp, rnp, rdp) || needgp;
> @@ -2041,7 +2039,6 @@ static void rcu_gp_cleanup(struct rcu_state *rsp)
> */
> static int __noreturn rcu_gp_kthread(void *arg)
> {
> - int fqs_state;
> int gf;
> unsigned long j;
> int ret;
> @@ -2073,7 +2070,7 @@ static int __noreturn rcu_gp_kthread(void *arg)
> }
>
> /* Handle quiescent-state forcing. */
> - fqs_state = RCU_SAVE_DYNTICK;
> + rsp->save_dyntick = true;
> j = jiffies_till_first_fqs;
> if (j > HZ) {
> j = HZ;
> @@ -2101,7 +2098,7 @@ static int __noreturn rcu_gp_kthread(void *arg)
> trace_rcu_grace_period(rsp->name,
> READ_ONCE(rsp->gpnum),
> TPS("fqsstart"));
> - fqs_state = rcu_gp_fqs(rsp, fqs_state);
> + rcu_gp_fqs(rsp);
> trace_rcu_grace_period(rsp->name,
> READ_ONCE(rsp->gpnum),
> TPS("fqsend"));
> diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
> index 2e991f8361e4..12303ff25077 100644
> --- a/kernel/rcu/tree.h
> +++ b/kernel/rcu/tree.h
> @@ -412,13 +412,6 @@ struct rcu_data {
> struct rcu_state *rsp;
> };
>
> -/* Values for fqs_state field in struct rcu_state. */
> -#define RCU_GP_IDLE 0 /* No grace period in progress. */
> -#define RCU_GP_INIT 1 /* Grace period being initialized. */
> -#define RCU_SAVE_DYNTICK 2 /* Need to scan dyntick state. */
> -#define RCU_FORCE_QS 3 /* Need to force quiescent state. */
> -#define RCU_SIGNAL_INIT RCU_SAVE_DYNTICK
> -
> /* Values for nocb_defer_wakeup field in struct rcu_data. */
> #define RCU_NOGP_WAKE_NOT 0
> #define RCU_NOGP_WAKE 1
> @@ -469,15 +462,16 @@ struct rcu_state {
>
> /* The following fields are guarded by the root rcu_node's lock. */
>
> - u8 fqs_state ____cacheline_internodealigned_in_smp;
> - /* Force QS state. */
> - u8 boost; /* Subject to priority boost. */
> + u8 boost ____cacheline_internodealigned_in_smp;
> + /* Subject to priority boost. */
> unsigned long gpnum; /* Current gp number. */
> unsigned long completed; /* # of last completed gp. */
> struct task_struct *gp_kthread; /* Task for grace periods. */
> wait_queue_head_t gp_wq; /* Where GP task waits. */
> short gp_flags; /* Commands for GP task. */
> short gp_state; /* GP kthread sleep state. */
> + bool save_dyntick; /* Collect dyntick-idle */
> + /* snapshots when forcing QS. */
>
> /* End of fields guarded by root rcu_node's lock. */
>
> @@ -539,7 +533,7 @@ struct rcu_state {
> #define RCU_GP_FLAG_FQS 0x2 /* Need grace-period quiescent-state forcing. */
>
> /* Values for rcu_state structure's gp_flags field. */
> -#define RCU_GP_WAIT_INIT 0 /* Initial state. */
> +#define RCU_GP_IDLE 0 /* Initial state and no GP in progress. */
> #define RCU_GP_WAIT_GPS 1 /* Wait for grace-period start. */
> #define RCU_GP_DONE_GPS 2 /* Wait done for grace-period start. */
> #define RCU_GP_WAIT_FQS 3 /* Wait for force-quiescent-state time. */
> diff --git a/kernel/rcu/tree_trace.c b/kernel/rcu/tree_trace.c
> index 6fc4c5ff3bb5..1d61f5ba4641 100644
> --- a/kernel/rcu/tree_trace.c
> +++ b/kernel/rcu/tree_trace.c
> @@ -268,7 +268,7 @@ static void print_one_rcu_state(struct seq_file *m, struct rcu_state *rsp)
> gpnum = rsp->gpnum;
> seq_printf(m, "c=%ld g=%ld s=%d jfq=%ld j=%x ",
> ulong2long(rsp->completed), ulong2long(gpnum),
> - rsp->fqs_state,
> + rsp->gp_state,
> (long)(rsp->jiffies_force_qs - jiffies),
> (int)(jiffies & 0xffff));
> seq_printf(m, "nfqs=%lu/nfqsng=%lu(%lu) fqlh=%lu oqlen=%ld/%ld\n",
> --
> 1.8.5.6
>
>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/