Re: [PATCH 2/2] [PATCH] sched: Add smp_rmb() in task rq locking cycles

From: Chris Metcalf
Date: Tue Apr 28 2015 - 14:01:11 EST


On 04/28/2015 01:43 PM, Peter Zijlstra wrote:
On Tue, Apr 28, 2015 at 12:58:55PM -0400, Chris Metcalf wrote:
On 04/28/2015 12:40 PM, Peter Zijlstra wrote:
On Tue, Apr 28, 2015 at 11:53:21AM -0400, Chris Metcalf wrote:

The reason we use two 32-bit fields on tilepro is that the only available
atomic instruction is tns (test and set), which sets a 32-bit "1" value
into the target memory and returns the old 32-bit value.
And you want a ticket lock as opposed to the test-and-set lock because
with 64 tiles starvation under contention is a real worry?
We see substantial unfairness under load with a plain spinlock,
basically because nearer cores on the mesh network can exponentially
crowd out further cores. The ticket lock avoids that, though we
have to be careful to do backoff when checking the lock to avoid
DDoS in the mesh network.
Does your arch have 16bit atomic load/stores ? If so, would something
like the below not make sense?

Yes, tilepro can do 16-bit atomic load/stores. The reason we didn't use
your approach (basically having tns provide locking for the head/tail)
is just a perceived efficiency gain from rolling the tns lock into the head.

The current tilepro arch_spin_lock() is just three mesh network transactions
(tns, store, load). Your proposed spin lock is five (tns, load, store, store, load).
Or, looking it from a core-centric perspective, the current arch_spin_lock()
only has to wait on requests from the mesh network twice (tns, load), basically
once for each member of the lock structure; your proposed version is three
(tns, load, load).

I don't honestly know how critical this difference is, but that's why I
designed it the way I did.

I think your goal with your proposed redesign is being able to atomically
read head and tail together for arch_spin_unlock_wait(), but I don't see
why that's better than just reading head, checking it's not equal to tail
with a separate read, then spinning waiting for head to change.


typedef struct {
union {
struct {
unsigned short head;
unsigned short tail;
};
unsigned int tickets;
};
unsigned int lock;
} arch_spinlock_t;

static inline void ___tns_lock(unsigned int *lock)
{
while (tns(lock))
cpu_relax();
}

static inline void ___tns_unlock(unsigned int *lock)
{
WRITE_ONCE(*lock, 0);
}

static inline void arch_spin_lock(arch_spinlock_t *lock)
{
unsigned short head, tail;

___tns_lock(&lock->lock); /* XXX does the TNS imply a ___sync? */
head = lock->head;
lock->head++;
___tns_unlock(&lock->lock);

while (READ_ONCE(lock->tail) != head)
cpu_relax();
}

static inline void arch_spin_unlock(arch_spinlock_t *lock)
{
/*
* can do with regular load/store because the lock owner
* is the only one going to do stores to the tail
*/
unsigned short tail = READ_ONCE(lock->tail);
smp_mb(); /* MB is stronger than RELEASE */
WRITE_ONCE(lock->tail, tail + 1);
}

static inline void arch_spin_unlock_wait(arch_spinlock_t *lock)
{
union {
struct {
unsigned short head;
unsigned short tail;
};
unsigned int tickets;
} x;

for (;;) {
x.tickets = READ_ONCE(lock->tickets);
if (x.head == x.tail)
break;
cpu_relax();
}
}

--
Chris Metcalf, EZChip Semiconductor
http://www.ezchip.com

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/