[PATCH] Remove GP_REPLAY state from rcu_sync

From: Joel Fernandes (Google)
Date: Fri Oct 04 2019 - 10:57:51 EST


From: Joel Fernandes <joel@xxxxxxxxxxxxxxxxx>

Please consider this is an RFC for discussion only. Just want to discuss
why the GP_REPLAY state is needed at all.

Here's the intention AFAICS:
When rcu_sync_exit() has happened, the gp_state changes to GP_EXIT while
we wait for a grace period before transitioning to GP_IDLE. In the
meanwhile, if we receive another rcu_sync_exit(), then we want to wait
for another GP to account for that.

Drawing some timing diagrams, it looks like this:

Legend:
rse = rcu_sync_enter
rsx = rcu_sync_exit
i = GP_IDLE
x = GP_EXIT
r = GP_REPLAY
e = GP_ENTER
p = GP_PASSED
rx = GP_REPLAY changes to GP_EXIT

GP num = The GP we are one.

note: A GP passes between the states:
e and p
x and i
x and rx
rx and i

In a simple case, the timing and states look like:
time
---------------------->
GP num 1111111 2222222
GP state i e p x i
CPU0 : rse rsx

However we can enter the replay state like this:
time
---------------------->
GP num 1111111 2222222222222222222223333333
GP state i e p x r rx i
CPU0 : rse rsx
CPU1 : rse rsx

Due to the second rse + rsx, we had to wait for another GP.

I believe the rationale is, if another rsx happens, another GP has to
happen.

But this is not always true if you consider the following events:

time
---------------------->
GP num 111111 22222222222222222222222222222222233333333
GP state i e p x r rx i
CPU0 : rse rsx
CPU1 : rse rsx
CPU2 : rse rsx

Here, we had 3 grace periods that elapsed, 1 for the rcu_sync_enter(),
and 2 for the rcu_sync_exit(s).

However, we had 3 rcu_sync_exit()s, not 2. In other words, the
rcu_sync_exit() got batched.

So my point here is, rcu_sync_exit() does not really always cause a new
GP to happen and we can end up having the rcu_sync_exit()s as batched
and sharing the same grace period.

Then what is the point of the GP_REPLAY state at all if it does not
always wait for a new GP? Taking a step back, why did we intend to have
to wait for a new GP if another rcu_sync_exit() comes while one is still
in progress?

Cc: bristot@xxxxxxxxxx
Cc: peterz@xxxxxxxxxxxxx
Cc: oleg@xxxxxxxxxx
Cc: paulmck@xxxxxxxxxx
Cc: rcu@xxxxxxxxxxxxxxx
Signed-off-by: Joel Fernandes (Google) <joel@xxxxxxxxxxxxxxxxx>
---
kernel/rcu/sync.c | 14 ++------------
1 file changed, 2 insertions(+), 12 deletions(-)

diff --git a/kernel/rcu/sync.c b/kernel/rcu/sync.c
index d4558ab7a07d..4f3aad67992c 100644
--- a/kernel/rcu/sync.c
+++ b/kernel/rcu/sync.c
@@ -10,7 +10,7 @@
#include <linux/rcu_sync.h>
#include <linux/sched.h>

-enum { GP_IDLE = 0, GP_ENTER, GP_PASSED, GP_EXIT, GP_REPLAY };
+enum { GP_IDLE = 0, GP_ENTER, GP_PASSED, GP_EXIT };

#define rss_lock gp_wait.lock

@@ -85,13 +85,6 @@ static void rcu_sync_func(struct rcu_head *rhp)
*/
WRITE_ONCE(rsp->gp_state, GP_PASSED);
wake_up_locked(&rsp->gp_wait);
- } else if (rsp->gp_state == GP_REPLAY) {
- /*
- * A new rcu_sync_exit() has happened; requeue the callback to
- * catch a later GP.
- */
- WRITE_ONCE(rsp->gp_state, GP_EXIT);
- rcu_sync_call(rsp);
} else {
/*
* We're at least a GP after the last rcu_sync_exit(); eveybody
@@ -167,16 +160,13 @@ void rcu_sync_enter(struct rcu_sync *rsp)
*/
void rcu_sync_exit(struct rcu_sync *rsp)
{
- WARN_ON_ONCE(READ_ONCE(rsp->gp_state) == GP_IDLE);
- WARN_ON_ONCE(READ_ONCE(rsp->gp_count) == 0);
+ WARN_ON_ONCE(READ_ONCE(rsp->gp_state) < GP_PASSED);

spin_lock_irq(&rsp->rss_lock);
if (!--rsp->gp_count) {
if (rsp->gp_state == GP_PASSED) {
WRITE_ONCE(rsp->gp_state, GP_EXIT);
rcu_sync_call(rsp);
- } else if (rsp->gp_state == GP_EXIT) {
- WRITE_ONCE(rsp->gp_state, GP_REPLAY);
}
}
spin_unlock_irq(&rsp->rss_lock);
--
2.23.0.581.g78d2f28ef7-goog