Re: [PATCH v2 2/5] locking/pvqspinlock: Fix missed PV wakeup problem

From: Waiman Long
Date: Sun Jul 17 2016 - 19:11:12 EST


On 07/17/2016 07:07 PM, Waiman Long wrote:
On 07/15/2016 09:16 PM, Boqun Feng wrote:
On Fri, Jul 15, 2016 at 06:35:56PM +0200, Peter Zijlstra wrote:
On Fri, Jul 15, 2016 at 12:07:03PM +0200, Peter Zijlstra wrote:
So if we are kicked by the unlock_slowpath, and the lock is stealed by
someone else, we need hash its node again and set l->locked to
_Q_SLOW_VAL, then enter pv_wait.
Right, let me go think about this a bit.
Urgh, brain hurt.

So I _think_ the below does for it but I could easily have missed yet
another case.

Waiman's patch has the problem that it can have two pv_hash() calls for
the same lock in progress and I'm thinking that means we can hit the
BUG() in pv_hash() due to that.

I think Waiman's patch does have the problem of two pv_hash() calls for
the same lock in progress. As I mentioned in the first version:

http://lkml.kernel.org/g/20160527074331.GB8096@insomnia

And he tried to address this in the patch #3 of this series. However,
I think what is proper here is either to reorder patch 2 and 3 or to
merge patch 2 and 3, otherwise, we are introducing a bug in the middle
of this series.

Thoughts, Waiman?

Patches 2 and 3 can be reversed.


That said, I found Peter's way is much simpler and easier to understand
;-)

I agree. Peter's patch is better than mine.

If we can't, it still has a problem because its not telling us either.



--- a/kernel/locking/qspinlock_paravirt.h
+++ b/kernel/locking/qspinlock_paravirt.h
@@ -20,7 +20,8 @@
* native_queued_spin_unlock().
*/

-#define _Q_SLOW_VAL (3U<< _Q_LOCKED_OFFSET)
+#define _Q_HASH_VAL (3U<< _Q_LOCKED_OFFSET)
+#define _Q_SLOW_VAL (7U<< _Q_LOCKED_OFFSET)

/*
* Queue Node Adaptive Spinning
@@ -36,14 +37,11 @@
*/
#define PV_PREV_CHECK_MASK 0xff

-/*
- * Queue node uses: vcpu_running& vcpu_halted.
- * Queue head uses: vcpu_running& vcpu_hashed.
- */
enum vcpu_state {
- vcpu_running = 0,
- vcpu_halted, /* Used only in pv_wait_node */
- vcpu_hashed, /* = pv_hash'ed + vcpu_halted */
+ vcpu_node_running = 0,
+ vcpu_node_halted,
+ vcpu_head_running,
We actually don't need this extra running state, right? Because nobody
cares about the difference between two running states right now.

That addresses the problem in Xinhui patch that changed the state from halted to hashed. With that separation, that change is no longer necessary.

Oh, I meant Wanpeng's double hash race patch, not Xinhui's patch.

Cheers,
Longman