Re: [PATCH] x86 rwsem optimization extreme

From: Zachary Amsden
Date: Wed Feb 17 2010 - 20:05:12 EST



On 02/17/2010 02:10 PM, Linus Torvalds wrote:
The cost of 'adc' may happen to be identical in this case, but I suspect
you didn't test on UP, where the 'lock' prefix goes away. An unlocked
'add' tends to be faster than an unlocked 'adc'.

(It's possible that some micro-architectures don't care, since it's a
memory op, and they can see that 'C' is set. But it's a fragile assumption
that it would always be ok).

FWIW, I don't know of any microarchitecture where adc is slower than
add, *as long as* the setup time for the CF flag is already used up.
However, as I already commented, I don't think this is worth it. This
inline appears to only be instantiated once, and as such, it takes a
whopping six bytes across the entire kernel.


Without the locks,

stc; adc %rdx, (%rax)

vs.

add %rdx, (%rax)

Shows no statistical difference on Intel.
On AMD, the first form is about twice as expensive.

Course this is all completely useless, but it would be if the locks were inline (which is actually an askable question now). There was just so much awesomeness going on with the 64-bit rwsem constructs I felt I had to add even more awesomeness to the plate. For some definition of awesomeness.

Zach
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/