[PATCH 2/2] locking/rwsem: Wake readers in a reader-owned rwsem if first waiter is a reader

From: Waiman Long
Date: Fri Mar 18 2022 - 12:19:50 EST


In an analysis of a recent vmcore, a reader-owned rwsem was found with
385 readers but no writer in the wait queue. That is kind of unusual
but it may be caused by some race conditions that we have not fully
understood yet. In such a case, all the readers in the wait queue should
join the other reader-owners and acquire the read lock.

In rwsem_down_write_slowpath(), an incoming writer will try to wake
up the front readers under such circumstance. That is not the case for
rwsem_down_read_slowpath(), modify the code to do this. This includes the
original supported case where the wait queue is empty and the incoming
reader is going to wake up itself.

With CONFIG_LOCK_EVENT_COUNTS enabled, the newly added rwsem_rlock_rwake
event counter had 13 hits right after the bootup of a 2-socket system. So
the condition that a reader-owned rwsem has readers at the front of
the wait queue does happen pretty frequently. This patch will help to
speed thing up in such cases.

Signed-off-by: Waiman Long <longman@xxxxxxxxxx>
---
kernel/locking/lock_events_list.h | 1 +
kernel/locking/rwsem.c | 19 +++++++++++++------
2 files changed, 14 insertions(+), 6 deletions(-)

diff --git a/kernel/locking/lock_events_list.h b/kernel/locking/lock_events_list.h
index 97fb6f3f840a..9bb9f048848b 100644
--- a/kernel/locking/lock_events_list.h
+++ b/kernel/locking/lock_events_list.h
@@ -64,6 +64,7 @@ LOCK_EVENT(rwsem_rlock_steal) /* # of read locks by lock stealing */
LOCK_EVENT(rwsem_rlock_fast) /* # of fast read locks acquired */
LOCK_EVENT(rwsem_rlock_fail) /* # of failed read lock acquisitions */
LOCK_EVENT(rwsem_rlock_handoff) /* # of read lock handoffs */
+LOCK_EVENT(rwsem_rlock_rwake) /* # of readers wakeup in slow path */
LOCK_EVENT(rwsem_wlock) /* # of write locks acquired */
LOCK_EVENT(rwsem_wlock_fail) /* # of failed write lock acquisitions */
LOCK_EVENT(rwsem_wlock_handoff) /* # of write lock handoffs */
diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c
index f71a9693d05a..53f7f0b4724a 100644
--- a/kernel/locking/rwsem.c
+++ b/kernel/locking/rwsem.c
@@ -997,17 +997,24 @@ rwsem_down_read_slowpath(struct rw_semaphore *sem, long count, unsigned int stat
count = atomic_long_add_return(adjustment, &sem->count);

/*
- * If there are no active locks, wake the front queued process(es).
- *
- * If there are no writers and we are first in the queue,
- * wake our own waiter to join the existing active readers !
+ * Do a rwsem_mark_wake() under one of the following conditions:
+ * 1) there is no active read or write lock.
+ * 2) there is no writer-owner (can be reader-owned) and the first
+ * waiter is a reader.
*/
if (!(count & RWSEM_LOCK_MASK)) {
clear_nonspinnable(sem);
wake = true;
+ } else if (!(count & RWSEM_WRITER_MASK)) {
+ wake = rwsem_first_waiter(sem)->type == RWSEM_WAITING_FOR_READ;
+ /*
+ * Check the number of cases where readers at the front
+ * of the previously non-empty wait list are to be woken.
+ */
+ lockevent_cond_inc(rwsem_rlock_rwake,
+ wake && !(adjustment & RWSEM_FLAG_WAITERS));
}
- if (wake || (!(count & RWSEM_WRITER_MASK) &&
- (adjustment & RWSEM_FLAG_WAITERS)))
+ if (wake)
rwsem_mark_wake(sem, RWSEM_WAKE_ANY, &wake_q);

raw_spin_unlock_irq(&sem->wait_lock);
--
2.27.0