Re: [PATCH] bcachefs: six locks: Fix missing barrier on wait->lock_acquired

From: Boqun Feng
Date: Sat Aug 12 2023 - 16:59:19 EST


On Sat, Aug 12, 2023 at 12:58:34PM -0700, Boqun Feng wrote:
> On Sat, Aug 12, 2023 at 03:27:20PM -0400, Kent Overstreet wrote:
> > Six locks do lock handoff via the wakeup path: the thread doing the
> > wakeup also takes the lock on behalf of the waiter, which means the
> > waiter only has to look at its waitlist entry, and doesn't have to touch
> > the lock cacheline while another thread is using it.
> >
> > Linus noticed that this needs a real barrier, which this patch fixes.
> >
> > Also add a comment for the should_sleep_fn() error path.
> >
> > Signed-off-by: Kent Overstreet <kent.overstreet@xxxxxxxxx>
> > Cc: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
> > Cc: Boqun Feng <boqun.feng@xxxxxxxxx>
> > Cc: linux-bcachefs@xxxxxxxxxxxxxxx
> > Cc: linux-kernel@xxxxxxxxxxxxxxx
> > ---
> > fs/bcachefs/six.c | 33 +++++++++++++++++++++++++--------
> > 1 file changed, 25 insertions(+), 8 deletions(-)
> >
> > diff --git a/fs/bcachefs/six.c b/fs/bcachefs/six.c
> > index 581aee565e..b6ca53c852 100644
> > --- a/fs/bcachefs/six.c
> > +++ b/fs/bcachefs/six.c
> > @@ -223,14 +223,16 @@ static void __six_lock_wakeup(struct six_lock *lock, enum six_lock_type lock_typ
> > if (ret <= 0)
> > goto unlock;
> >
> > - __list_del(w->list.prev, w->list.next);
> > task = w->task;
> > + __list_del(w->list.prev, w->list.next);
> > /*
> > - * Do no writes to @w besides setting lock_acquired - otherwise
> > - * we would need a memory barrier:
> > + * The release barrier here ensures the ordering of the
> > + * __list_del before setting w->lock_acquired; @w is on the
> > + * stack of the thread doing the waiting and will be reused
> > + * after it sees w->lock_acquired with no other locking:
> > + * pairs with smp_load_acquire() in six_lock_slowpath()
> > */
> > - barrier();
> > - w->lock_acquired = true;
> > + smp_store_release(&w->lock_acquired, true);
> > wake_up_process(task);

Given the whole percpu counters for readers thing is similar to
percpu_rw_semaphore, I took a look at percpu_rwsem and wonder there is
a path to combine that with SIX lock. And that makes me realize another
fix may be needed here, considering the following case:

Task A Task B
====== ======
__six_lock_wakeup():
task = w->task;
...
smp_store_release(&w->locked_acquire, true);
six_lock_slowpath():
while (1) {
if (smp_load_acquire(->lock_acquired))
break;
}

six_unlock();
do_exit(); // Task B ends its life :(

wake_up_process(task); // @task is a dangling task pointer!!!

Looks like get_task_struct() and put_task_struct() are needed here:
similar to percpu_rwsem_wake_function().

[Copy Peter as well]

Regards,
Boqun

> > }
> >
> > @@ -502,17 +504,32 @@ static int six_lock_slowpath(struct six_lock *lock, enum six_lock_type type,
> > while (1) {
> > set_current_state(TASK_UNINTERRUPTIBLE);
> >
> > - if (wait->lock_acquired)
> > + /*
> > + * Ensures that writes to the waitlist entry happen after we see
>
> Maybe my English, but "happen after" here is a little confusing: writes
> happen after the read of ->lock_acquired? How about
>
> /*
> * Ensures once we observe the write to
> * wait->lock_acquired, we must observe the writes to
> * the waitlist entry: pairs with smp_store_release in
> * __six_lock_wakeup
> */
>
> ?
>
> I haven't finished my review on the SIX lock, but this patch looks good
> to me, feel free to add:
>
> Reviewed-by: Boqun Feng <boqun.feng@xxxxxxxxx>
>
> Regards,
> Boqun
>
> > + * wait->lock_acquired: pairs with the smp_store_release in
> > + * __six_lock_wakeup
> > + */
> > + if (smp_load_acquire(&wait->lock_acquired))
> > break;
> >
> > ret = should_sleep_fn ? should_sleep_fn(lock, p) : 0;
> > if (unlikely(ret)) {
> > + bool acquired;
> > +
> > + /*
> > + * If should_sleep_fn() returns an error, we are
> > + * required to return that error even if we already
> > + * acquired the lock - should_sleep_fn() might have
> > + * modified external state (e.g. when the deadlock cycle
> > + * detector in bcachefs issued a transaction restart)
> > + */
> > raw_spin_lock(&lock->wait_lock);
> > - if (!wait->lock_acquired)
> > + acquired = wait->lock_acquired;
> > + if (!acquired)
> > list_del(&wait->list);
> > raw_spin_unlock(&lock->wait_lock);
> >
> > - if (unlikely(wait->lock_acquired))
> > + if (unlikely(acquired))
> > do_six_unlock_type(lock, type);
> > break;
> > }
> > --
> > 2.40.1
> >