Re: [PATCH] x86 rwsem optimization extreme

From: Linus Torvalds
Date: Wed Feb 17 2010 - 17:13:09 EST




On Wed, 17 Feb 2010, Zachary Amsden wrote:
>
> The x86 instruction set provides the ability to add an additional
> bit into addition or subtraction by using the carry flag.
> It also provides instructions to directly set or clear the
> carry flag. By forcibly setting the carry flag, we can then
> represent one particular 64-bit constant, namely
>
> 0xffffffff + 1 = 0x100000000
>
> using only 32-bit values. In particular we can optimize the rwsem
> write lock release by noting it is of exactly this form.

Don't do this.

Just shift the constants down by two, and suddenly you don't need any
clever tricks, because all the constants fit in 32 bits anyway,
regardless of sign issues.

So just change the

# define RWSEM_ACTIVE_MASK 0xffffffffL

line into

# define RWSEM_ACTIVE_MASK 0x3fffffffL

and you're done.

The cost of 'adc' may happen to be identical in this case, but I suspect
you didn't test on UP, where the 'lock' prefix goes away. An unlocked
'add' tends to be faster than an unlocked 'adc'.

(It's possible that some micro-architectures don't care, since it's a
memory op, and they can see that 'C' is set. But it's a fragile assumption
that it would always be ok).

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/