Re: [PATCH v3 2/3] locking: Clarify requirements for smp_mb__after_spinlock()

From: Paul E. McKenney
Date: Tue Jul 03 2018 - 11:37:13 EST


On Tue, Jul 03, 2018 at 04:53:59PM +0200, Andrea Parri wrote:
> There are 11 interpretations of the requirements described in the header
> comment for smp_mb__after_spinlock(): one for each LKMM maintainer, and
> one currently encoded in the Cat file. Stick to the latter (until a more
> satisfactory solution is available).
>
> This also reworks some snippets related to the barrier to illustrate the
> requirements and to link them to the idioms which are relied upon at its
> call sites.
>
> Suggested-by: Boqun Feng <boqun.feng@xxxxxxxxx>
> Signed-off-by: Andrea Parri <andrea.parri@xxxxxxxxxxxxxxxxxxxx>
> Acked-by: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
> Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
> Cc: Ingo Molnar <mingo@xxxxxxxxxx>
> Cc: Will Deacon <will.deacon@xxxxxxx>
> Cc: "Paul E. McKenney" <paulmck@xxxxxxxxxxxxxxxxxx>

Looks good, a couple of changes suggested below.

Thanx, Paul

> ---
> Changes since v2:
> - restore note about RCsc lock (Peter Zijlstra)
> - add Peter's Acked-by: tag
>
> Changes since v1:
> - rework the snippets (Peter Zijlstra)
> - style fixes (Alan Stern and Matthew Wilcox)
> - add Boqun's Suggested-by: tag
>
> include/linux/spinlock.h | 53 ++++++++++++++++++++++++++++++++----------------
> kernel/sched/core.c | 41 +++++++++++++++++++------------------
> 2 files changed, 57 insertions(+), 37 deletions(-)
>
> diff --git a/include/linux/spinlock.h b/include/linux/spinlock.h
> index 1e8a464358384..d70a06ff2bdd2 100644
> --- a/include/linux/spinlock.h
> +++ b/include/linux/spinlock.h
> @@ -114,29 +114,48 @@ do { \
> #endif /*arch_spin_is_contended*/
>
> /*
> - * This barrier must provide two things:
> + * smp_mb__after_spinlock() provides the equivalent of a full memory barrier
> + * between program-order earlier lock acquisitions and program-order later

Not just the earlier lock acquisition, but also all program-order earlier
memory accesses, correct?

> + * memory accesses.
> *
> - * - it must guarantee a STORE before the spin_lock() is ordered against a
> - * LOAD after it, see the comments at its two usage sites.
> + * This guarantees that the following two properties hold:
> *
> - * - it must ensure the critical section is RCsc.
> + * 1) Given the snippet:
> *
> - * The latter is important for cases where we observe values written by other
> - * CPUs in spin-loops, without barriers, while being subject to scheduling.
> + * { X = 0; Y = 0; }
> *
> - * CPU0 CPU1 CPU2
> + * CPU0 CPU1
> *
> - * for (;;) {
> - * if (READ_ONCE(X))
> - * break;
> - * }
> - * X=1
> - * <sched-out>
> - * <sched-in>
> - * r = X;
> + * WRITE_ONCE(X, 1); WRITE_ONCE(Y, 1);
> + * spin_lock(S); smp_mb();
> + * smp_mb__after_spinlock(); r1 = READ_ONCE(X);
> + * r0 = READ_ONCE(Y);
> + * spin_unlock(S);
> *
> - * without transitivity it could be that CPU1 observes X!=0 breaks the loop,
> - * we get migrated and CPU2 sees X==0.
> + * it is forbidden that CPU0 does not observe CPU1's store to Y (r0 = 0)
> + * and CPU1 does not observe CPU0's store to X (r1 = 0); see the comments
> + * preceding the call to smp_mb__after_spinlock() in __schedule() and in
> + * try_to_wake_up().

Should we say that this is an instance of the SB pattern? (Am OK either
way, just asking the question.)

> + *
> + * 2) Given the snippet:
> + *
> + * { X = 0; Y = 0; }
> + *
> + * CPU0 CPU1 CPU2
> + *
> + * spin_lock(S); spin_lock(S); r1 = READ_ONCE(Y);
> + * WRITE_ONCE(X, 1); smp_mb__after_spinlock(); smp_rmb();
> + * spin_unlock(S); r0 = READ_ONCE(X); r2 = READ_ONCE(X);
> + * WRITE_ONCE(Y, 1);
> + * spin_unlock(S);
> + *
> + * it is forbidden that CPU0's critical section executes before CPU1's
> + * critical section (r0 = 1), CPU2 observes CPU1's store to Y (r1 = 1)
> + * and CPU2 does not observe CPU0's store to X (r2 = 0); see the comments
> + * preceding the calls to smp_rmb() in try_to_wake_up() for similar
> + * snippets but "projected" onto two CPUs.
> + *
> + * Property (2) upgrades the lock to an RCsc lock.
> *
> * Since most load-store architectures implement ACQUIRE with an smp_mb() after
> * the LL/SC loop, they need no further barriers. Similarly all our TSO
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index da8f12119a127..ec9ef0aec71ac 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -1999,21 +1999,20 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
> * be possible to, falsely, observe p->on_rq == 0 and get stuck
> * in smp_cond_load_acquire() below.
> *
> - * sched_ttwu_pending() try_to_wake_up()
> - * [S] p->on_rq = 1; [L] P->state
> - * UNLOCK rq->lock -----.
> - * \
> - * +--- RMB
> - * schedule() /
> - * LOCK rq->lock -----'
> - * UNLOCK rq->lock
> + * sched_ttwu_pending() try_to_wake_up()
> + * STORE p->on_rq = 1 LOAD p->state
> + * UNLOCK rq->lock
> + *
> + * __schedule() (switch to task 'p')
> + * LOCK rq->lock smp_rmb();
> + * smp_mb__after_spinlock();
> + * UNLOCK rq->lock
> *
> * [task p]
> - * [S] p->state = UNINTERRUPTIBLE [L] p->on_rq
> + * STORE p->state = UNINTERRUPTIBLE LOAD p->on_rq
> *
> - * Pairs with the UNLOCK+LOCK on rq->lock from the
> - * last wakeup of our task and the schedule that got our task
> - * current.
> + * Pairs with the LOCK+smp_mb__after_spinlock() on rq->lock in
> + * __schedule(). See the comment for smp_mb__after_spinlock().
> */
> smp_rmb();
> if (p->on_rq && ttwu_remote(p, wake_flags))
> @@ -2027,15 +2026,17 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
> * One must be running (->on_cpu == 1) in order to remove oneself
> * from the runqueue.
> *
> - * [S] ->on_cpu = 1; [L] ->on_rq
> - * UNLOCK rq->lock
> - * RMB
> - * LOCK rq->lock
> - * [S] ->on_rq = 0; [L] ->on_cpu
> + * __schedule() (switch to task 'p') try_to_wake_up()
> + * STORE p->on_cpu = 1 LOAD p->on_rq
> + * UNLOCK rq->lock
> + *
> + * __schedule() (put 'p' to sleep)
> + * LOCK rq->lock smp_rmb();
> + * smp_mb__after_spinlock();
> + * STORE p->on_rq = 0 LOAD p->on_cpu
> *
> - * Pairs with the full barrier implied in the UNLOCK+LOCK on rq->lock
> - * from the consecutive calls to schedule(); the first switching to our
> - * task, the second putting it to sleep.
> + * Pairs with the LOCK+smp_mb__after_spinlock() on rq->lock in
> + * __schedule(). See the comment for smp_mb__after_spinlock().
> */
> smp_rmb();
>
> --
> 2.7.4
>