Re: [PATCH] rcutorture: Fix rcu_torture_pipe_update_one()/rcu_torture_writer() data race and concurrency bug

From: Paul E. McKenney
Date: Wed Mar 06 2024 - 22:21:23 EST


On Wed, Mar 06, 2024 at 06:49:38PM -0800, Linus Torvalds wrote:
> On Wed, 6 Mar 2024 at 18:43, Linus Torvalds
> <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
> >
> > I dunno.
>
> Oh, and just looking at that patch, I still think the code is confused.
>
> On the reading side, we have:
>
> pipe_count = smp_load_acquire(&p->rtort_pipe_count);
> if (pipe_count > RCU_TORTURE_PIPE_LEN) {
> /* Should not happen, but... */
>
> where that comment clearly says that the pipe_count we read (whether
> with READ_ONCE() or with my smp_load_acquire() suggestion) should
> never be larger than RCU_TORTURE_PIPE_LEN.

I will fix that comment. It should not happen *if* RCU is working
correctly. It can happen if you have an RCU that is so broken that a
single RCU reader can span more than ten grace periods. An example of
an RCU that really is this broken can be selected using rcutorture's
torture_type=busted module parameter. No surprise, given that its
implementation of call_rcu() invokes the callback function directly and
its implementation of synchronize_rcu() is a complete no-op. ;-)

Of course, the purpose of that value of the torture_type module parameter
(along with all other possible values containing the string "busted")
is to test rcutorture itself.

> But the writing side very clearly did:
>
> i = rp->rtort_pipe_count;
> if (i > RCU_TORTURE_PIPE_LEN)
> i = RCU_TORTURE_PIPE_LEN;
> ...
> smp_store_release(&rp->rtort_pipe_count, ++i);
>
> (again, syntactically it could have been "i + 1" instead of my "++i" -
> same value), so clearly the writing side *can* write a value that is >
> RCU_TORTURE_PIPE_LEN.
>
> So while the whole READ/WRITE_ONCE vs smp_load_acquire/store_release
> is one thing that might be worth looking at, I think there are other
> very confusing aspects here.

With this change in that comment, are things better?

Thanx, Paul

------------------------------------------------------------------------

diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c
index 6b821a7037b03..0cb5452ecd945 100644
--- a/kernel/rcu/rcutorture.c
+++ b/kernel/rcu/rcutorture.c
@@ -2000,7 +2000,8 @@ static bool rcu_torture_one_read(struct torture_random_state *trsp, long myid)
preempt_disable();
pipe_count = READ_ONCE(p->rtort_pipe_count);
if (pipe_count > RCU_TORTURE_PIPE_LEN) {
- /* Should not happen, but... */
+ // Should not happen in a correct RCU implementation,
+ // happens quite often for torture_type=busted.
pipe_count = RCU_TORTURE_PIPE_LEN;
}
completed = cur_ops->get_gp_seq();