Re: [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL.

From: Waiman Long
Date: Thu Jan 21 2016 - 18:02:55 EST


On 01/21/2016 04:29 AM, Ding Tianhong wrote:
I build a script to create several process for ioctl loop calling,
the ioctl will calling the kernel function just like:
xx_ioctl {
...
rtnl_lock();
function();
rtnl_unlock();
...
}
The function may sleep several ms, but will not halt, at the same time
another user service may calling ifconfig to change the state of the
ethernet, and after several hours, the hung task thread report this problem:

========================================================================
149738.039038] INFO: task ifconfig:11890 blocked for more than 120 seconds.
[149738.040597] "echo 0> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[149738.042280] ifconfig D ffff88061ec13680 0 11890 11573 0x00000080
[149738.042284] ffff88052449bd40 0000000000000082 ffff88053a33f300 ffff88052449bfd8
[149738.042286] ffff88052449bfd8 ffff88052449bfd8 ffff88053a33f300 ffffffff819e6240
[149738.042288] ffffffff819e6244 ffff88053a33f300 00000000ffffffff ffffffff819e6248
[149738.042290] Call Trace:
[149738.042300] [<ffffffff8160d219>] schedule_preempt_disabled+0x29/0x70
[149738.042303] [<ffffffff8160af65>] __mutex_lock_slowpath+0xc5/0x1c0
[149738.042305] [<ffffffff8160a3cf>] mutex_lock+0x1f/0x2f
[149738.042309] [<ffffffff8150d945>] rtnl_lock+0x15/0x20
[149738.042311] [<ffffffff81514e3a>] dev_ioctl+0xda/0x590
[149738.042314] [<ffffffff816121cc>] ? __do_page_fault+0x21c/0x560
[149738.042318] [<ffffffff814e42c5>] sock_do_ioctl+0x45/0x50
[149738.042320] [<ffffffff814e49d0>] sock_ioctl+0x1f0/0x2c0
[149738.042324] [<ffffffff811dc9b5>] do_vfs_ioctl+0x2e5/0x4c0
[149738.042327] [<ffffffff811e6a00>] ? fget_light+0xa0/0xd0

================================ cut here ================================

I got the vmcore and found that the ifconfig is already in the wait_list of the
rtnl_lock for 120 second, but my process could get and release the rtnl_lock
normally several times in one second, so it means that my process jump the
queue and the ifconfig couldn't get the rtnl all the time, I check the mutex lock
slow path and found that the mutex may spin on owner ignore whether the wait list
is empty, it will cause the task in the wait list always be cut in line, so add
test for wait list in the mutex_can_spin_on_owner and avoid this problem.

Signed-off-by: Ding Tianhong<dingtianhong@xxxxxxxxxx>
Cc: Ingo Molnar<mingo@xxxxxxxxxx>
Cc: Peter Zijlstra<peterz@xxxxxxxxxxxxx>
Cc: Davidlohr Bueso<dave@xxxxxxxxxxxx>
Cc: Linus Torvalds<torvalds@xxxxxxxxxxxxxxxxxxxx>
Cc: Paul E. McKenney<paulmck@xxxxxxxxxx>
Cc: Thomas Gleixner<tglx@xxxxxxxxxxxxx>
Cc: Will Deacon<will.deacon@xxxxxxx>
Cc: Jason Low<jason.low2@xxxxxx>
Cc: Tim Chen<tim.c.chen@xxxxxxxxxxxxxxx>
Cc: Waiman Long<Waiman.Long@xxxxxx>
---
kernel/locking/mutex.c | 11 ++++++-----
1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index 0551c21..596b341 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -256,7 +256,7 @@ static inline int mutex_can_spin_on_owner(struct mutex *lock)
struct task_struct *owner;
int retval = 1;

- if (need_resched())
+ if (need_resched() || atomic_read(&lock->count) == -1)
return 0;

rcu_read_lock();
@@ -283,10 +283,11 @@ static inline bool mutex_try_to_acquire(struct mutex *lock)
/*
* Optimistic spinning.
*
- * We try to spin for acquisition when we find that the lock owner
- * is currently running on a (different) CPU and while we don't
- * need to reschedule. The rationale is that if the lock owner is
- * running, it is likely to release the lock soon.
+ * We try to spin for acquisition when we find that there are no
+ * pending waiters and the lock owner is currently running on a
+ * (different) CPU and while we don't need to reschedule. The
+ * rationale is that if the lock owner is running, it is likely
+ * to release the lock soon.
*
* Since this needs the lock owner, and this mutex implementation
* doesn't track the owner atomically in the lock field, we need to

This patch will largely defeat the performance benefit of optimistic spinning. I have an alternative solution to this live-lock problem. Would you mind trying out the attached patch to see if it can fix your problem?

Cheers,
Longman

From 1bbb5a4434d395f48163abc5435c5c720a15d327 Mon Sep 17 00:00:00 2001
From: Waiman Long <Waiman.Long@xxxxxxx>
Date: Thu, 21 Jan 2016 17:53:14 -0500
Subject: [PATCH] locking/mutex: Enable optimistic spinning of woken task in wait list

Ding Tianhong reported a live-lock situation where a constant stream
of incoming optimistic spinners blocked a task in the wait list from
getting the mutex.

This patch attempts to fix this live-lock condition by enabling the
a woken task in the wait list to enter optimistic spinning loop itself
with precedence over the ones in the OSQ. This should prevent the
live-lock
condition from happening.

Signed-off-by: Waiman Long <Waiman.Long@xxxxxxx>
---
include/linux/mutex.h | 2 +
kernel/locking/mutex.c | 95 +++++++++++++++++++++++++++++++++++++++++++++++-
2 files changed, 95 insertions(+), 2 deletions(-)

diff --git a/include/linux/mutex.h b/include/linux/mutex.h
index 2cb7531..2c55ecd 100644
--- a/include/linux/mutex.h
+++ b/include/linux/mutex.h
@@ -57,6 +57,8 @@ struct mutex {
#endif
#ifdef CONFIG_MUTEX_SPIN_ON_OWNER
struct optimistic_spin_queue osq; /* Spinner MCS lock */
+ /* Set if wait list head actively spinning */
+ int wlh_spinning;
#endif
#ifdef CONFIG_DEBUG_MUTEXES
void *magic;
diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index 0551c21..8b27b03 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -55,6 +55,7 @@ __mutex_init(struct mutex *lock, const char *name, struct lock_class_key *key)
mutex_clear_owner(lock);
#ifdef CONFIG_MUTEX_SPIN_ON_OWNER
osq_lock_init(&lock->osq);
+ lock->wlh_spinning = false;
#endif

debug_mutex_init(lock, name, key);
@@ -346,8 +347,12 @@ static bool mutex_optimistic_spin(struct mutex *lock,
if (owner && !mutex_spin_on_owner(lock, owner))
break;

- /* Try to acquire the mutex if it is unlocked. */
- if (mutex_try_to_acquire(lock)) {
+ /*
+ * Try to acquire the mutex if it is unlocked and the wait
+ * list head isn't spinning on the lock.
+ */
+ if (!READ_ONCE(lock->wlh_spinning) &&
+ mutex_try_to_acquire(lock)) {
lock_acquired(&lock->dep_map, ip);

if (use_ww_ctx) {
@@ -398,12 +403,91 @@ done:

return false;
}
+
+/*
+ * Wait list head optimistic spinning
+ *
+ * The wait list head, when woken up, will try to spin on the lock if the
+ * lock owner is active. It will also set the wlh_spinning flag to give
+ * itself a higher chance of getting the lock than the other optimisically
+ * spinning locker in the OSQ.
+ */
+static bool mutex_wlh_opt_spin(struct mutex *lock,
+ struct ww_acquire_ctx *ww_ctx, const bool use_ww_ctx)
+{
+ struct task_struct *owner, *task = current;
+ int gotlock = false;
+
+ WRITE_ONCE(lock->wlh_spinning, true);
+ while (true) {
+ if (use_ww_ctx && ww_ctx->acquired > 0) {
+ struct ww_mutex *ww;
+
+ ww = container_of(lock, struct ww_mutex, base);
+ /*
+ * If ww->ctx is set the contents are undefined, only
+ * by acquiring wait_lock there is a guarantee that
+ * they are not invalid when reading.
+ *
+ * As such, when deadlock detection needs to be
+ * performed the optimistic spinning cannot be done.
+ */
+ if (READ_ONCE(ww->ctx))
+ break;
+ }
+
+ /*
+ * If there's an owner, wait for it to either
+ * release the lock or go to sleep.
+ */
+ owner = READ_ONCE(lock->owner);
+ if (owner && !mutex_spin_on_owner(lock, owner))
+ break;
+
+ /*
+ * Try to acquire the mutex if it is unlocked. The mutex
+ * value is set to -1 which will be changed to 0 later on
+ * if the wait list becomes empty.
+ */
+ if (!mutex_is_locked(lock) &&
+ (atomic_cmpxchg_acquire(&lock->count, 1, -1) == 1)) {
+ gotlock = true;
+ break;
+ }
+
+ /*
+ * When there's no owner, we might have preempted between the
+ * owner acquiring the lock and setting the owner field. If
+ * we're an RT task that will live-lock because we won't let
+ * the owner complete.
+ */
+ if (!owner && (need_resched() || rt_task(task)))
+ break;
+
+ /*
+ * The cpu_relax() call is a compiler barrier which forces
+ * everything in this loop to be re-loaded. We don't need
+ * memory barriers as we'll eventually observe the right
+ * values at the cost of a few extra spins.
+ */
+ cpu_relax_lowlatency();
+
+ }
+ WRITE_ONCE(lock->wlh_spinning, false);
+ return gotlock;
+}
#else
static bool mutex_optimistic_spin(struct mutex *lock,
struct ww_acquire_ctx *ww_ctx, const bool use_ww_ctx)
{
return false;
}
+
+static bool mutex_wlh_opt_spin(struct mutex *lock,
+ struct ww_acquire_ctx *ww_ctx, const bool use_ww_ctx)
+{
+ return false;
+}
#endif

__visible __used noinline
@@ -543,6 +627,8 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
lock_contended(&lock->dep_map, ip);

for (;;) {
+ int gotlock;
+
/*
* Lets try to take the lock again - this is needed even if
* we get here for the first time (shortly after failing to
@@ -577,7 +663,12 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
/* didn't get the lock, go to sleep: */
spin_unlock_mutex(&lock->wait_lock, flags);
schedule_preempt_disabled();
+
+ /* optimistically spinning on the mutex without the wait lock */
+ gotlock = mutex_wlh_opt_spin(lock, ww_ctx, use_ww_ctx);
spin_lock_mutex(&lock->wait_lock, flags);
+ if (gotlock)
+ break;
}
__set_task_state(task, TASK_RUNNING);

--
1.7.1