Re: [RFC] arm: use built-in byte swap function

From: Kim Phillips
Date: Tue Feb 05 2013 - 22:07:28 EST


On Fri, 1 Feb 2013 07:33:17 +0000
"Woodhouse, David" <david.woodhouse@xxxxxxxxx> wrote:

> On Fri, 2013-02-01 at 01:17 +0000, Russell King - ARM Linux wrote:
> >
> > > I've tried both gcc 4.6.3 [1] and 4.6.4 [2]. If you can point me to
> > > a 4.5.x, I'll try that, too, but as it stands now, if one moves the
> > > code added to swab.h below outside of its armv6 protection,
> > > gcc adds calls to __bswapsi2.
> >
> > Take a look at the message I sent on the 29th towards the beginning of
> > this thread for details of gcc 4.5.4 behaviour.
>
> I'd like to see a comment (with PR# if appropriate) explaining clearly
> *why* it isn't enabled for <ARMv6 even with a bleeding-edge compiler.

ok I think I've figured it out: the difference in the defconfigs that
fail (at91_dt, at91sam9g45, and lpc32xx) is that they are ARM9's
(armv4/5), have CC_OPTIMIZE_FOR_SIZE set, and have code with
multiple swaps ready for space optimization: gcc -Os emits calls
to __bswapsi2 on those platforms to save space because they don't
have the single rev byte swap instruction.

> Russell's test also seemed to indicate that the 32-bit and 64-bit swap
> support was present and functional in GCC 4.5.4 (as indeed it should
> have been since 4.4), so I'm still not quite sure why you require 4.6
> for that.

initially it was based at looking at gcc commit history for the
'rev' instruction implementation, but now I've got 4.4, 4.5, 4.6 and
4.7 compilers to perform Russell's test:

$ for cc in 4.4 4.5 4.6 4.7; do \
arm-linux-gnueabi-gcc-$cc --version | grep gcc ; \
for a in armv3 armv4 armv4t armv5t armv5te armv6k armv6 armv7-a; do \
echo -n $a:; \
for f in 16 32 64; do \
echo 'unsigned foo(unsigned val) { return __builtin_bswap'$f'(val); }' | arm-linux-gnueabi-gcc-$cc -w -x c -S -o - - -march=$a | grep 'bl'; \
done; \
done; \
done

whose output is:

arm-linux-gnueabi-gcc-4.4 (Ubuntu/Linaro 4.4.7-1ubuntu2) 4.4.7
armv3: bl __builtin_bswap16
bl __bswapsi2
bl __bswapdi2
armv4: bl __builtin_bswap16
bl __bswapsi2
bl __bswapdi2
armv4t: bl __builtin_bswap16
bl __bswapsi2
bl __bswapdi2
armv5t: bl __builtin_bswap16
bl __bswapsi2
bl __bswapdi2
armv5te: bl __builtin_bswap16
bl __bswapsi2
bl __bswapdi2
armv6k: bl __builtin_bswap16
bl __bswapsi2
bl __bswapdi2
armv6: bl __builtin_bswap16
bl __bswapsi2
bl __bswapdi2
armv7-a: bl __builtin_bswap16
bl __bswapsi2
bl __bswapdi2
arm-linux-gnueabi-gcc-4.5 (Ubuntu/Linaro 4.5.3-12ubuntu2) 4.5.3
armv3: bl __builtin_bswap16
armv4: bl __builtin_bswap16
armv4t: bl __builtin_bswap16
armv5t: bl __builtin_bswap16
armv5te: bl __builtin_bswap16
armv6k: bl __builtin_bswap16
armv6: bl __builtin_bswap16
armv7-a: bl __builtin_bswap16
arm-linux-gnueabi-gcc-4.6 (Ubuntu/Linaro 4.6.3-8ubuntu1) 4.6.3 20120624 (prerelease)
armv3: bl __builtin_bswap16
armv4: bl __builtin_bswap16
armv4t: bl __builtin_bswap16
armv5t: bl __builtin_bswap16
armv5te: bl __builtin_bswap16
armv6k: bl __builtin_bswap16
armv6: bl __builtin_bswap16
armv7-a: bl __builtin_bswap16
arm-linux-gnueabi-gcc-4.7 (Ubuntu/Linaro 4.7.2-1ubuntu1) 4.7.2
armv3: bl __builtin_bswap16
armv4: bl __builtin_bswap16
armv4t: bl __builtin_bswap16
armv5t: bl __builtin_bswap16
armv5te: bl __builtin_bswap16
armv6k: bl __builtin_bswap16
armv6: bl __builtin_bswap16
armv7-a: bl __builtin_bswap16

So 4.4 should be exempt from using the built-ins because it always
emits __bswapsi2 calls: it doesn't matter whether or not -Os or -O2
are added as options in the test.

gcc 4.5, 4.6, and 4.7 all support 32 & 64-bit versions, so we
should check for gcc >= 4.5 instead of gcc >= 4.6.

I've added a new check for !CC_OPTIMIZE_FOR_SIZE and build-tested all
defconfigs with gcc 4.6.3 - here's v5: