Re: [PATCH] reduce inlined x86 memcpy by 2 bytes

From: Denis Vlasenko
Date: Fri Mar 18 2005 - 05:08:57 EST


On Friday 18 March 2005 11:21, Denis Vlasenko wrote:
> This memcpy() is 2 bytes shorter than one currently in mainline
> and it have one branch less. It is also 3-4% faster in microbenchmarks
> on small blocks if block size is multiple of 4. Mainline is slower
> because it has to branch twice per memcpy, both mispredicted
> (but branch prediction hides that in microbenchmark).
>
> Last remaining branch can be dropped too, but then we execute second
> 'rep movsb' always, even if blocksize%4==0. This is slower than mainline
> because 'rep movsb' is microcoded. I wonder, tho, whether 'branchlessness'
> wins over this in real world use (not in bench).
>
> I think blocksize%4==0 happens more than 25% of the time.

s/%4/&3 of course.
--
vda

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/