x86: Enhance perf checksum profiling and x86 implementation

From: Neil Horman
Date: Wed Nov 06 2013 - 10:23:50 EST


Hey all-
Sorry for the delay here, but it took me a bit to get the perf bits
working to my satisfaction. As Ingo requested I added do_csum to the perf
benchmarking utility (as part of the mem suite, since it didn't seem right to
create its own suite). I've also revamped the do_csum routine to do some smart
prefetching, as it yielded slightly better performance over simple prefetching
at a fixed stride:

Without prefetch:
[root@rdma-dev-02 perf]# ./perf bench mem csum -r x86-64-csum -l 1500B -s 512MB
-i 1000000 -c
# Running mem/csum benchmark...
# Copying 1500B Bytes ...

0.955977 Cycle/Byte

With prefetch:
[root@rdma-dev-02 perf]# ./perf bench mem csum -r x86-64-csum -l 1500B -s 512MB
-i 1000000 -c
# Running mem/csum benchmark...
# Copying 1500B Bytes ...

0.922540 Cycle/Byte


About a 3% improvement.

Signed-off-by: Neil Horman <nhorman@xxxxxxxxxxxxx>
CC: sebastien.dugue@xxxxxxxx
CC: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
CC: Ingo Molnar <mingo@xxxxxxxxxx>
CC: "H. Peter Anvin" <hpa@xxxxxxxxx>
CC: x86@xxxxxxxxxx

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/