Re: Big git diff speedup by avoiding x86 "fast string" memcmp

From: Nick Piggin
Date: Thu Dec 09 2010 - 23:28:12 EST


On Thu, Dec 09, 2010 at 06:09:38PM +1100, Nick Piggin wrote:
> So replace it with an open-coded byte comparison. This increases code
> size by 24 bytes in the critical __d_lookup_rcu function, but the

Actually, if the loop assumes len is non zero (which is the case for
dentry compare), then the bloat is only 8 bytes, so not a problem.

Also got numbers versus vanilla kernel, out of interest.

> speedup is huge, averaging 10 runs of each:
>
> git diff st user sys elapsed CPU
vanilla 1.19 3.21 4.47 98.0
> before 1.15 2.57 3.82 97.1
> after 1.14 2.35 3.61 96.8
>
> git diff mt user sys elapsed CPU
vanilla 1.57 45.75 3.60 1312
> before 1.27 3.85 1.46 349
> after 1.26 3.54 1.43 333
>

Single thread elapsed time improvment vanilla vs vfs 19.23%. Not quite
as big as the AMD fam10h speedup, that's probably because Westmere does
atomics so damn quickly.

Multi thread numbers are no surprise.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/