Re: [BUG]locking/rwsem: only clean RWSEM_FLAG_HANDOFF when already set

From: Waiman Long
Date: Sun Nov 07 2021 - 10:24:48 EST


On 11/7/21 04:01, Hillf Danton wrote:
On Sat, 6 Nov 2021 23:25:38 -0400 Waiman Long wrote:
On 11/6/21 08:39, 马振华 wrote:
Dear longman,

recently , i find a issue which rwsem count is negative value, it
happened always when a task try to get the lock
with __down_write_killable , then it is killed

this issue happened like this

            CPU2         CPU4
    task A[reader]     task B[writer]
    down_read_killable[locked]
    sem->count=0x100
            down_write_killable
            sem->count=0x102[wlist not empty]
    up_read
    count=0x2
            sig kill received
    down_read_killable
    sem->count=0x102[wlist not empty]
            goto branch out_nolock:
list_del(&waiter.list);
wait list is empty
sem->count-RWSEM_FLAG_HANDOFF
sem->count=0xFE
    list_empty(&sem->wait_list) is TRUE
     sem->count andnot RWSEM_FLAG_WAITERS
      sem->count=0xFC
    up_read
    sem->count -= 0x100
    sem->count=0xFFFFFFFFFFFFFFFC
    DEBUG_RWSEMS_WARN_ON(tmp < 0, sem);

so sem->count will be negative after writer is killed
i think if flag RWSEM_FLAG_HANDOFF is not set, we shouldn't clean it
Thanks for reporting this possible race condition.

However, I am still trying to figure how it is possible to set the
wstate to WRITER_HANDOFF without actually setting the handoff bit as
well. The statement sequence should be as follows:

wstate = WRITER_HANDOFF;
raw_spin_lock_irq(&sem->wait_lock);
if (rwsem_try_write_lock(sem, wstate))
raw_spin_unlock_irq(&sem->wait_lock);
  :
if (signal_pending_state(state, current))
    goto out_nolock

The rwsem_try_write_lock() function will make sure that we either
acquire the lock and clear handoff or set the handoff bit. This should
be done before we actually check for signal. I do think that it is
Given that WRITER_HANDOFF makes no sure that RWSEM_FLAG_HANDOFF is set in
wsem_try_write_lock(), the flag should be cleared only by the setter to
avoid count underflow.

Hillf

probably safer to use atomic_long_andnot to clear the handoff bit
instead of using atomic_long_add().

I did have a tentative patch to address this issue which is somewhat similar to your approach. However, I would like to further investigate the exact mechanics of the race condition to make sure that I won't miss a latent bug somewhere else in the rwsem code.

-Longman

diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c
index c51387a43265..20be967620c0 100644
--- a/kernel/locking/rwsem.c
+++ b/kernel/locking/rwsem.c
@@ -347,7 +347,8 @@ enum rwsem_wake_type {
 enum writer_wait_state {
     WRITER_NOT_FIRST,    /* Writer is not first in wait list */
     WRITER_FIRST,        /* Writer is first in wait list     */
-    WRITER_HANDOFF        /* Writer is first & handoff needed */
+    WRITER_NEED_HANDOFF    /* Writer is first & handoff needed */
+    WRITER_HANDOFF        /* Writer is first & handoff set */
 };

 /*
@@ -532,11 +533,11 @@ static void rwsem_mark_wake(struct rw_semaphore *sem,
  * race conditions between checking the rwsem wait list and setting the
  * sem->count accordingly.
  *
- * If wstate is WRITER_HANDOFF, it will make sure that either the handoff
+ * If wstate is WRITER_NEED_HANDOFF, it will make sure that either the handoff
  * bit is set or the lock is acquired with handoff bit cleared.
  */
 static inline bool rwsem_try_write_lock(struct rw_semaphore *sem,
-                    enum writer_wait_state wstate)
+                    enum writer_wait_state *wstate)
 {
     long count, new;

@@ -546,13 +547,13 @@ static inline bool rwsem_try_write_lock(struct rw_semaphore *sem,
     do {
         bool has_handoff = !!(count & RWSEM_FLAG_HANDOFF);

-        if (has_handoff && wstate == WRITER_NOT_FIRST)
+        if (has_handoff && *wstate == WRITER_NOT_FIRST)
             return false;

         new = count;

         if (count & RWSEM_LOCK_MASK) {
-            if (has_handoff || (wstate != WRITER_HANDOFF))
+            if (has_handoff || (*wstate != WRITER_NEED_HANDOFF))
                 return false;

             new |= RWSEM_FLAG_HANDOFF;
@@ -569,8 +570,10 @@ static inline bool rwsem_try_write_lock(struct rw_semaphore *sem,
      * We have either acquired the lock with handoff bit cleared or
      * set the handoff bit.
      */
-    if (new & RWSEM_FLAG_HANDOFF)
+    if (new & RWSEM_FLAG_HANDOFF) {
+        *wstate = WRITER_HANDOFF;
         return false;
+    }

     rwsem_set_owner(sem);
     return true;
@@ -1083,7 +1086,7 @@ rwsem_down_write_slowpath(struct rw_semaphore *sem, int state)
     /* wait until we successfully acquire the lock */
     set_current_state(state);
     for (;;) {
-        if (rwsem_try_write_lock(sem, wstate)) {
+        if (rwsem_try_write_lock(sem, &wstate)) {
             /* rwsem_try_write_lock() implies ACQUIRE on success */
             break;
         }
@@ -1138,7 +1141,7 @@ rwsem_down_write_slowpath(struct rw_semaphore *sem, int state)
              */
             if ((wstate == WRITER_FIRST) && (rt_task(current) ||
                 time_after(jiffies, waiter.timeout))) {
-                wstate = WRITER_HANDOFF;
+                wstate = WRITER_NEED_HANDOFF;
                 lockevent_inc(rwsem_wlock_handoff);
                 break;
             }
@@ -1159,7 +1162,7 @@ rwsem_down_write_slowpath(struct rw_semaphore *sem, int state)
     list_del(&waiter.list);

     if (unlikely(wstate == WRITER_HANDOFF))
-        atomic_long_add(-RWSEM_FLAG_HANDOFF, &sem->count);
+        atomic_long_addnot(RWSEM_FLAG_HANDOFF, &sem->count);

     if (list_empty(&sem->wait_list))
         atomic_long_andnot(RWSEM_FLAG_WAITERS, &sem->count);
--