Re: [PATCH] locking/rwsem: Prevent non-first waiter from spinning in down_write() slowpath

From: Waiman Long
Date: Fri Sep 30 2022 - 09:57:26 EST


On 9/30/22 00:46, Mukesh Ojha wrote:
Hi,

Thanks for looking into this issue.

On 9/29/2022 11:34 PM, Waiman Long wrote:
A non-first waiter can potentially spin in the for loop of
rwsem_down_write_slowpath() without sleeping but fail to acquire the
lock even if the rwsem is free if the following sequence happens:

   Non-first waiter       First waiter      Lock holder
   ----------------       ------------      -----------
   Acquire wait_lock
   rwsem_try_write_lock():
     Set handoff bit if RT or
       wait too long
     Set waiter->handoff_set
   Release wait_lock
                          Acquire wait_lock
                          Inherit waiter->handoff_set
                          Release wait_lock
                       Clear owner
                                            Release lock
   if (waiter.handoff_set) {
     rwsem_spin_on_owner(();
     if (OWNER_NULL)
       goto trylock_again;
   }
   trylock_again:
   Acquire wait_lock
   rwsem_try_write_lock():
      if (first->handoff_set && (waiter != first))
          return false;
   Release wait_lock

It is especially problematic if the non-first waiter is an RT task and
it is running on the same CPU as the first waiter as this can lead to
live lock.

Fixes: d257cc8cb8d5 ("locking/rwsem: Make handoff bit handling more consistent")
Signed-off-by: Waiman Long <longman@xxxxxxxxxx>
---
  kernel/locking/rwsem.c | 13 ++++++++++---
  1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c
index 65f0262f635e..ad676e99e0b3 100644
--- a/kernel/locking/rwsem.c
+++ b/kernel/locking/rwsem.c
@@ -628,6 +628,11 @@ static inline bool rwsem_try_write_lock(struct rw_semaphore *sem,
          new = count;
            if (count & RWSEM_LOCK_MASK) {
+            /*
+             * A waiter (first or not) can set the handoff bit
+             * if it is an RT task or wait in the wait queue
+             * for too long.
+             */
              if (has_handoff || (!rt_task(waiter->task) &&
                          !time_after(jiffies, waiter->timeout)))
                  return false;
@@ -643,11 +648,13 @@ static inline bool rwsem_try_write_lock(struct rw_semaphore *sem,
      } while (!atomic_long_try_cmpxchg_acquire(&sem->count, &count, new));
        /*
-     * We have either acquired the lock with handoff bit cleared or
-     * set the handoff bit.
+     * We have either acquired the lock with handoff bit cleared or set
+     * the handoff bit. Only the first waiter can have its handoff_set
+     * set here to enable optimistic spinning in slowpath loop.
       */
      if (new & RWSEM_FLAG_HANDOFF) {
-        waiter->handoff_set = true;
+        if (waiter == first)
+            waiter->handoff_set = true;
          lockevent_inc(rwsem_wlock_handoff);

nit: Should this not get incremented on waiter->handoff_set=true only ?

The lock event counter records the # of time a handoff bit is set. It is not related to how the handoff_set in the waiter structure is being set.

cheers,
Longman