Re: [PATCH 2/5] bitops: compile time optimization for hweight_long(CONSTANT)

From: H. Peter Anvin
Date: Fri Feb 05 2010 - 16:59:46 EST


On 02/05/2010 04:11 AM, Borislav Petkov wrote:
> +
> +unsigned int __arch_hweight16(unsigned int w)
> +{
> + unsigned int res = 0;
> +
> + asm volatile("xor %%dh, %%dh\n\t"
> + __arch_hweight_alt(32)
> + : "=di" (res)
> + : "di" (w)
> + : "ecx", "memory");
> +

This is wrong in more ways than I can shake a stick at.

a) "di" doesn't mean the DI register - it means the DX register (d) or
an immediate (i). Since you don't have any reference to either %0 or %1
in your code, you have no way of knowing which one it is. The
constraint for the di register is "D".

b) On 32 bits, the first argument register is in %eax (with %edx used
for the upper half of a 32-bit argument), but on 64 bits, the first
argument is in %rdi, with the return still in %rax.

c) You call a C function, but you don't clobber the set of registers
that a C function would clobber. You either need to put the function in
an assembly wrapper (which is better in the long run), or clobber the
full set of registers that is clobbered by a C function (which is better
in the short term) -- which is eax, edx, ecx on 32 bits, but rax, rdi,
esi, rdx, rcx, r8, r9, r10, r11 on 64 bits.

d) On the other hand, you do *not* need a "memory" clobber.

-hpa
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/