Re: [PATCH 1/2] x86/bitops: implement __test_bit

From: Ingo Molnar
Date: Tue Sep 01 2015 - 05:24:31 EST



* Michael S. Tsirkin <mst@xxxxxxxxxx> wrote:

> I applied this patch on top of mine:

Yeah, looks similar to the one I sent.

> -static inline int __variable_test_bit(long nr, const unsigned long *addr)
> -{
> - int oldbit;
> -
> - asm volatile("bt %2,%1\n\t"
> - "sbb %0,%0"
> - : "=r" (oldbit)
> - : "m" (*addr), "Ir" (nr));
> -
> - return oldbit;
> -}

> And the code size went up:
>
> 134836 2997 8372 146205 23b1d arch/x86/kvm/kvm-intel.ko ->
> 134846 2997 8372 146215 23b27 arch/x86/kvm/kvm-intel.ko
>
> 342690 47640 441 390771 5f673 arch/x86/kvm/kvm.ko ->
> 342738 47640 441 390819 5f6a3 arch/x86/kvm/kvm.ko
>
> I tried removing __always_inline, this had no effect.

But code size isn't the only factor.

Uros Bizjak pointed out that the reason GCC does not use the "BT reg,mem"
instruction is that it's highly suboptimal even on recent microarchitectures,
Sandy Bridge is listed as having a 10 cycles latency (!) for this instruction:

http://www.agner.org/optimize/instruction_tables.pdf

this instruction had bad latency going back to Pentium 4 CPUs.

... so unless something changed in this area with Skylake I think using the
__variable_test_bit() code of the kernel is a bad choice and looking at kernel
size only is misleading.

It makes sense for atomics, but not for unlocked access.

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/