On Sat, 19 Sep 2009 14:11:38 +0300
Avi Kivity<avi@xxxxxxxxxx> wrote:
On 09/19/2009 12:19 PM, Arjan van de Ven wrote:memory access == communication due to cache line bounces....
Why? a sort that fits in memory is purely cpu and memory access.However, sort *would* benefit, and some UCLA students implementedparallel sort... call me skeptical. My gut feeling is that you'll
that for a term project. Unfortunately, the project is stalled
because the implementation was not efficient enough, and no one
has found the time to improve it since.
get killed by communication overhead.
(sort seems to be more communication than raw cpu use)
Instead of O(N log N) you'd get K * O(N/K log N/K) followed by andoing big-O arithmatic and then add constant divisions/multipliers is
O(N) merge. For large N and small K, you get a speedup of roughly K
(since the O(N) merge is dominated by the preceding sort.
rather.. dangerous ;)
You actually get K * C1 * O(N/K log N/K) + C2 * O(N)
where C1/C2 is the actual cost of the extra intra-cpu communication.
(and for most sorting, I suspect the communication costs dominate over
CPU cost)
I'd be happy to be proven wrong...