Re: [PATCH] x86: Run checksumming in parallel accross multiple alu's

From: Ingo Molnar
Date: Fri Nov 01 2013 - 05:21:56 EST



* Neil Horman <nhorman@xxxxxxxxxxxxx> wrote:

> Prefetch and simluated adcx/adox from above:
> Performance counter stats for './test.sh' (20 runs):
>
> 35,704,331 L1-dcache-load-misses ( +- 0.07% ) [75.00%]
> 0 L1-dcache-prefetches [75.00%]
> 19,751,409,264 cycles # 0.000 GHz ( +- 0.59% ) [75.00%]
> 34,850,056 branch-misses ( +- 1.29% ) [75.00%]
>
> 7.768602160 seconds time elapsed ( +- 1.38% )

btw., you might also want to try measuring only the basics:

-e cycles -e instructions -e branches -e branch-misses

that should give you 100% in the last column and should also allow
you to double check whether all the PMU counts are correct: is it
the expected number of instructions, expected number of branches,
expected number of branch-misses, etc.

Then you can remove branch stats and add just L1-dcache stats - and
still be 100% covered:

-e cycles -e instructions -e L1-dcache-loads -e L1-dcache-load-misses

etc.

Just so that you can trust what the PMU tells you. Prefetch counts
are sometimes off, they might include speculative activities, etc.

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/