Re: spinaphore conceptual draft

From: Kyle Moffett
Date: Mon May 30 2005 - 13:09:28 EST


On May 30, 2005, at 13:46:35, Andi Kleen wrote:
tsc goes backwards on AMD? Under what circumstances (I'm curious,
since I'm running one...)

It is not synchronized between CPUs and slowly drifts and can even run
at completely different frequencies there when you use powernow! on a
MP system.

Unsynchronized is fine, we're only taking differences of short times.
Slow drift is likewise OK too. As to the different frequencies, how
different are we talking about? Also, on such a system, is there any
way to determine a relative frequency, even if unreliable or occasionally
off?

It can be used reliably when you only do deltas on same CPU
and correct for the changing frequency. However then you run
into other problems, like it being quite slow on Intel.

The deltas here are generally short, always on the same CPU, and can be
corrected for changing frequency, assuming that said frequency is
available somehow.

Ideally it would be some sort of CPU cycle-counter, just so I can say
"In between lock and unlock, the CPU did 1000 cycles", for some value
of "cycle".

I suspect any attempt to use time stamps in locks is a bad
idea because of this.

Something like this could be built only for CPUs that do support that
kind of cycle counter.

My impression is that the aggressive bus access avoidance the
original poster was trying to implement is not that useful
on modern systems anyways which have fast busses. Also
it is not even clear it even saves anything; after all the
CPU will always snoop cache accesses for all cache lines
and polling for the EXCLUSIVE transition of the local cache line
is probably either free or very cheap.

The idea behind these locks is for bigger systems (8-way or more) for
heavily contended locks (say 32 threads doing write() on the same fd).
In such a system, cacheline snooping isn't practical at the hardware
level, and a lock such as this should be able to send several CPUs to
do other useful work (If there's any to be done) and minimize the
cacheline banging.

Optimizing for the congested case is always a bad idea anyways; in
case lock congestion is a problem it is better to fix the lock. And
when you have a really congested lock that for some reason cannot be
fixed then using some kind of queued lock (which also gives you
fairness on NUMA) is probably a better idea. But again you should
really fix the the lock instead, everything else is the wrong answer.

Good point

BTW Getting the fairness on the backoff scheme right would have been
probably a major problem.

Largely this is a fast guess heuristic, so max_time_before_schedule
could be this (and this would even work with priority inheritance):

effective_priority * waiters * avg_wait_time / 10;

The thing that seems to tickle CPU vendors much more is to avoid
wasting CPU time on SMT or on Hypervisors while spinning. That can be
all done with much simpler hints (like rep ; nop).

On such a system, a lock like this would take advantage of all the
properties of normal spinlocks, including any changes like that.



Cheers,
Kyle Moffett

-----BEGIN GEEK CODE BLOCK-----
Version: 3.12
GCM/CS/IT/U d- s++: a18 C++++>$ UB/L/X/*++++(+)>$ P+++(++++)>$
L++++(+++) E W++(+) N+++(++) o? K? w--- O? M++ V? PS+() PE+(-) Y+
PGP+++ t+(+++) 5 X R? tv-(--) b++++(++) DI+ D+ G e->++++$ h!*()>++$ r !y?(-)
------END GEEK CODE BLOCK------



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/