This is the only way to do it on the Alpha, as there is no
processor status word in which to store a carry flag.
> does not need branches or other pipeline killers at all on most CPUs.
> In fact on MIPS this even makes really nice code which takes four
> cycles per 4 bytes on 32 bit CPUs or four cycles per 8 bytes on R4000
> (64 bit) CPUs, R5000/R10000 should be able to do this in 2 cycles.
Or look at the csum_ipv6_magic routine I wrote for 2.1.11 wherein
all the carry bits for the 40 byte headers are merged in 3 (ev5)
cycles.
r~