Re: [benchmark] 1% performance overhead of paravirt_ops on nativekernels
From: Jeremy Fitzhardinge
Date: Tue May 26 2009 - 14:42:34 EST
Ingo Molnar wrote:
I did more 'perf stat mmap-perf 1' measurements (bound to a single
core, running single thread - to exclude cross-CPU noise), which in
essence measures CONFIG_PARAVIRT=y overhead on native kernels:
Thanks for taking the time to make these measurements. You'll agree
they're much better numbers than the last time you ran these tests?
Performance counter stats for './mmap-perf':
[vanilla] [PARAVIRT=y]
1230.805297 1242.828348 task clock ticks (msecs) + 0.97%
3602663413 3637329004 CPU cycles (events) + 0.96%
1927074043 1958330813 instructions (events) + 1.62%
That's around 1% on really fast hardware (Core2 E6800 @ 2.93 GHz,
4MB L2 cache), i.e. still significant overhead. Distros generally
enable CONFIG_PARAVIRT, even though a large majority of users never
actually runs them as Xen guests.
Did you do only a single run, or is this the result of multiple runs?
If so, what was your procedure? How did you control for page
placement/cache effects/other boot-to-boot variations?
Your numbers are not dissimilar to my measurements, but I also saw up to
1% performance improvement vs native from boot to boot (I saw up to 10%
reduction of cache misses with pvops, possibly because of its
de-inlining effects).
I also saw about 1% boot to boot variation with the non-pvops kernel.
While I think pvops does add *some* overhead, I think the absolute
magnitude is swamped in the noise. The best we can say is "somewhere
under 1% on modern hardware".
Are there plans to analyze and fix this overhead too, beyond the
paravirt-spinlocks overhead you analyzed? (Note that i had
CONFIG_PARAVIRT_SPINLOCKS disabled in this test.)
I think only those users should get overhead who actually run such
kernels in a virtualized environment.
I cannot cite a single other kernel feature that has so much
performance impact when runtime-disabled. For example, an often
cited bloat and overhead source is CONFIG_SECURITY=y.
Your particular benchmark does many, many mmap/mprotect/munmap/mremap
calls, and takes a lot of pagefaults. That's going to hit the hot path
with lots of pte updates and so on, but very few security hooks. How
does it compare with a more balanced workload?
Its runtime overhead (same system, same workload) is:
[vanilla] [SECURITY=y]
1219.652255 1230.805297 task clock ticks (msecs) + 0.91%
3574548461 3602663413 CPU cycles (events) + 0.78%
1915177924 1927074043 instructions (events) + 0.62%
( With the difference that the distros that enable CONFIG_SECURITY=y
tend to install and use at least one security module by default. )
So everyone who runs a CONFIG_PARAVIRT=y distro kernel has 1% of
overhead in this mmap-test workload - even if no Xen is used on that
box, ever.
So you're saying that:
* CONFIG_SECURITY adding +0.91% to wallclock time is OK, but pvops
adding +0.97% is not,
* your test is sensitive enough to make 0.06% difference
significant, and
* this benchmark is representative enough of real workloads that its
results are overall meaningful?
Config attached.
Is this derived from a RH distro config?
J
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/