Re: [PATCH 2/5] bitops: compile time optimization forhweight_long(CONSTANT)

From: Borislav Petkov
Date: Wed Feb 03 2010 - 13:14:22 EST


On Wed, Feb 03, 2010 at 07:42:51AM -0800, Andrew Morton wrote:
> We didn't deal with it on every architecture, which is something which
> the compiler extension takes care of.
>
> In fact I can't find anywhere where we dealt with it on x86.

Yeah, we talked briefly about using hardware popcnt, see thread
beginning at

http://linux.derkeiler.com/Mailing-Lists/Kernel/2009-06/msg00245.html

for example. I did an ftrace of the cpumask_weight() calls in sched.c to
see whether there would be a measurable performance gain but it didn't
seem so at the time. My numbers said something like ca. 170 hweight
calls per second and since the <lib/hweight.c> implementations roughly
translate to something like ~20 isns (hweight64 to about ~30), the whole
thing wasn't worth the trouble considering checking binutils versions
and slapping opcodes or using gcc intrinsics which involves gcc version
checking.

An alternatives solution which is based on CPUID flag could add the
popcnt opcode without checking any toolchain versions but how is the
replaced instruction going to look like? Something like

alternative("call hweightXX", "popcnt", X86_FEATURE_POPCNT)

by making sure the arg is in some register first?

Hmm..

--
Regards/Gruss,
Boris.

--
Advanced Micro Devices, Inc.
Operating Systems Research Center
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/