Re: [PATCH] locking/rwsem: Prevent non-first waiter from spinning in down_write() slowpath

From: Mukesh Ojha
Date: Fri Sep 30 2022 - 12:04:15 EST


Hi,

On 9/30/2022 7:38 PM, Waiman Long wrote:
On 9/30/22 01:06, Mukesh Ojha wrote:
Hi,

On 9/29/2022 11:34 PM, Waiman Long wrote:
A non-first waiter can potentially spin in the for loop of
rwsem_down_write_slowpath() without sleeping but fail to acquire the
lock even if the rwsem is free if the following sequence happens:

   Non-first waiter       First waiter      Lock holder
   ----------------       ------------      -----------
   Acquire wait_lock
   rwsem_try_write_lock():
     Set handoff bit if RT or
       wait too long
     Set waiter->handoff_set
   Release wait_lock
                          Acquire wait_lock
                          Inherit waiter->handoff_set
                          Release wait_lock
                       Clear owner
                                            Release lock
   if (waiter.handoff_set) {
     rwsem_spin_on_owner(();
     if (OWNER_NULL)
       goto trylock_again;
   }
   trylock_again:
   Acquire wait_lock
   rwsem_try_write_lock():
      if (first->handoff_set && (waiter != first))
          return false;
   Release wait_lock

It is especially problematic if the non-first waiter is an RT task and
it is running on the same CPU as the first waiter as this can lead to
live lock.

Fixes: d257cc8cb8d5 ("locking/rwsem: Make handoff bit handling more consistent")
Signed-off-by: Waiman Long <longman@xxxxxxxxxx>
---
  kernel/locking/rwsem.c | 13 ++++++++++---
  1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c
index 65f0262f635e..ad676e99e0b3 100644
--- a/kernel/locking/rwsem.c
+++ b/kernel/locking/rwsem.c
@@ -628,6 +628,11 @@ static inline bool rwsem_try_write_lock(struct rw_semaphore *sem,
          new = count;
            if (count & RWSEM_LOCK_MASK) {
+            /*
+             * A waiter (first or not) can set the handoff bit
+             * if it is an RT task or wait in the wait queue
+             * for too long.
+             */
              if (has_handoff || (!rt_task(waiter->task) &&
                          !time_after(jiffies, waiter->timeout)))

Not related to this issue, however wanted to understand the idea about this.

If RT task comes in any order either come first or later it is setting the RWSEM_FLAG_HANDOFF bit.
So, here we are giving some priority right a way to RT task however it
can not get waiter->handoff_set=true since it is not the first waiter.(after this patch), is it not conflicting ?

I have thought about moving the RT task forward in the wait queue, but then it will greatly complicate the code to try to do what a PREEMPT_RT kernel does using a rt_mutex variant of rwsem. The reason why HANDOFF bit is set when an RT task is in the wait queue is speed up the progression of the wait queue without the interference of optimistic spinner.


Thanks for taking time to explain the motivation behind it.

It will take sometime to test your patch from our side however, like the other patches this patch too seem to fix the issue.

Feel free to add
Reported-by: Mukesh Ojha <quic_mojha@xxxxxxxxxxx>
Reviewed-by: Mukesh Ojha <quic_mojha@xxxxxxxxxxx>

-Mukesh



Why can't we just keep like as below and not set
new |= RWSEM_FLAG_HANDOFF; and return false from here.

--------------0<------------------------------------
diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c
index 65f0262..dbe3e16 100644
--- a/kernel/locking/rwsem.c
+++ b/kernel/locking/rwsem.c
@@ -628,8 +628,8 @@ static inline bool rwsem_try_write_lock(struct
rw_semaphore *sem,
                 new = count;

                 if (count & RWSEM_LOCK_MASK) {
-                       if (has_handoff || (!rt_task(waiter->task) &&
-                                           !time_after(jiffies,
waiter->timeout)))
+                       if (has_handoff || (rt_task(waiter->task) &&
waiter != first) ||
+                          (!rt_task(waiter->task) &&
!time_after(jiffies, waiter->timeout)))
                                 return false;

As I said above, we want to make more forward progress in the wait queue if a RT task is waiting there to try to reduce its latency. That is the point of that if statement.

Cheers,
Longman