Re: [PATCH] rcu/doc: Add a quick quiz to explain further why we need smp_mb__after_unlock_lock()

From: Paul E. McKenney
Date: Fri Jun 11 2021 - 13:25:24 EST


On Fri, Jun 11, 2021 at 12:34:32PM +0200, Frederic Weisbecker wrote:
> On Thu, Jun 10, 2021 at 09:57:10AM -0700, Paul E. McKenney wrote:
> > diff --git a/Documentation/RCU/Design/Memory-Ordering/Tree-RCU-Memory-Ordering.rst b/Documentation/RCU/Design/Memory-Ordering/Tree-RCU-Memory-Ordering.rst
> > index 11cdab037bff..3cd5cb4d86e5 100644
> > --- a/Documentation/RCU/Design/Memory-Ordering/Tree-RCU-Memory-Ordering.rst
> > +++ b/Documentation/RCU/Design/Memory-Ordering/Tree-RCU-Memory-Ordering.rst
> > @@ -112,6 +112,35 @@ on PowerPC.
> > The ``smp_mb__after_unlock_lock()`` invocations prevent this
> > ``WARN_ON()`` from triggering.
> >
> > ++-----------------------------------------------------------------------+
> > +| **Quick Quiz**: |
> > ++-----------------------------------------------------------------------+
> > +| But the whole chain of rcu_node-structure locking guarantees that |
> > +| readers see all pre-grace-period accesses from the updater and |
> > +| also guarantees that the updater to see all post-grace-period |
>
> Should it be either "that the updater see" or "the updater to see"?

Good catch, I have reworked this paragraph.

> > +| accesses from the readers.
>
> Is it really post-grace-period that you meant here? The updater can't see
> the future. It's rather all reader accesses before the end of the grace period?

I have reworked this to talk about old and new readers on the one hand
and the updater's pre- and post-grace-period accesses on the other.

> > So why do we need all of those calls |
> > +| to smp_mb__after_unlock_lock()? |
> > ++-----------------------------------------------------------------------+
> > +| **Answer**: |
> > ++-----------------------------------------------------------------------+
> > +| Because we must provide ordering for RCU's polling grace-period |
> > +| primitives, for example, get_state_synchronize_rcu() and |
> > +| poll_state_synchronize_rcu(). For example: |
>
> Two times "for example" (sorry I'm nitpicking...)

But the example has two threads!

Kidding aside, I substituted "Consider this code" for the second
"For example".

> > +| |
> > +| CPU 0 CPU 1 |
> > +| ---- ---- |
> > +| WRITE_ONCE(X, 1) WRITE_ONCE(Y, 1) |
> > +| g = get_state_synchronize_rcu() smp_mb() |
> > +| while (!poll_state_synchronize_rcu(g)) r1 = READ_ONCE(X) |
> > +| continue; |
> > +| r0 = READ_ONCE(Y) |
>
> Good point, it's a nice merge of the initial examples!

Glad you like it!

> > +| |
> > +| RCU guarantees that that the outcome r0 == 0 && r1 == 0 will not |
>
> One "that" has to die here.

Can we instead show clemency and banish it to some other paragraph?

> > +| happen, even if CPU 1 is in an RCU extended quiescent state (idle |
> > +| or offline) and thus won't interact directly with the RCU core |
> > +| processing at all. |
>
> Thanks a lot!

Glad to help, and I will reach out to you should someone make the mistake
of insisting that I write something in French. ;-)

> > ++-----------------------------------------------------------------------+
> > +
> > This approach must be extended to include idle CPUs, which need
> > RCU's grace-period memory ordering guarantee to extend to any
> > RCU read-side critical sections preceding and following the current

How about like this?

+-----------------------------------------------------------------------+
| **Quick Quiz**: |
+-----------------------------------------------------------------------+
| But the chain of rcu_node-structure lock acquisitions guarantees |
| that new readers will see all of the updater's pre-grace-period |
| accesses and also guarantees that the updater's post-grace-period |
| accesses will see all of the old reader's accesses. So why do we |
| need all of those calls to smp_mb__after_unlock_lock()? |
+-----------------------------------------------------------------------+
| **Answer**: |
+-----------------------------------------------------------------------+
| Because we must provide ordering for RCU's polling grace-period |
| primitives, for example, get_state_synchronize_rcu() and |
| poll_state_synchronize_rcu(). Consider this code:: |
| |
| CPU 0 CPU 1 |
| ---- ---- |
| WRITE_ONCE(X, 1) WRITE_ONCE(Y, 1) |
| g = get_state_synchronize_rcu() smp_mb() |
| while (!poll_state_synchronize_rcu(g)) r1 = READ_ONCE(X) |
| continue; |
| r0 = READ_ONCE(Y) |
| |
| RCU guarantees that the outcome r0 == 0 && r1 == 0 will not |
| happen, even if CPU 1 is in an RCU extended quiescent state |
| (idle or offline) and thus won't interact directly with the RCU |
| core processing at all. |
+-----------------------------------------------------------------------+

Thanx, Paul