Re: lmbench lat_mmap slowdown with CONFIG_PARAVIRT

From: Ingo Molnar
Date: Tue Jan 20 2009 - 15:57:42 EST



* Jeremy Fitzhardinge <jeremy@xxxxxxxx> wrote:

> Ingo Molnar wrote:
>> * Ingo Molnar <mingo@xxxxxxx> wrote:
>>
>>
>>>> Times I believe are in nanoseconds for lmbench, anyway lower is
>>>> better.
>>>>
>>>> non pv AVG=464.22 STD=5.56
>>>> paravirt AVG=502.87 STD=7.36
>>>>
>>>> Nearly 10% performance drop here, which is quite a bit... hopefully
>>>> people are testing the speed of their PV implementations against
>>>> non-PV bare metal :)
>>>>
>>> Ouch, that looks unacceptably expensive. All the major distros turn
>>> CONFIG_PARAVIRT on. paravirt_ops was introduced in x86 with the
>>> express promise to have no measurable runtime overhead.
>>>
>>
>> Here are some more precise stats done via hw counters on a perfcounters
>> kernel using 'timec', running a modified version of the 'mmap
>> performance stress-test' app i made years ago.
>>
>> The MM benchmark app can be downloaded from:
>>
>> http://redhat.com/~mingo/misc/mmap-perf.c
>>
>> timec.c can be picked up from:
>>
>> http://redhat.com/~mingo/perfcounters/timec.c
>>
>> mmap-perf conducts 1 million mmap()/munmap()/mremap() calls, and
>> touches the mapped area as well with a certain chance. The patterns are
>> pseudo-random and the random seed is initialized to the same value so
>> repeated runs produce the exact same mmap sequence.
>>
>> I ran the test with a single thread and bound to a single core:
>>
>> # taskset 2 timec -e -5,-4,-3,0,1,2,3 ./mmap-perf 1
>>
>> [ I ran it as root - so that kernel-space hardware-counter statistics
>> are included as well. ]
>>
>> The results are quite surprisingly candid about the true costs of
>> paravirt_ops on the native kernel's overhead (CONFIG_PARAVIRT=y):
>>
>> -----------------------------------------------
>> | Performance counter stats for './mmap-perf' |
>> -----------------------------------------------
>> | |
>> | x86-defconfig | PARAVIRT=y
>> |------------------------------------------------------------------
>> |
>> | 1311.554526 | 1360.624932 task clock ticks (msecs) +3.74%
>> | |
>> | 1 | 1 CPU migrations
>> | 91 | 79 context switches
>> | 55945 | 55943 pagefaults
>> | ............................................
>> | 3781392474 | 3918777174 CPU cycles +3.63%
>> | 1957153827 | 2161280486 instructions +10.43%
>>
>
> !!
>
>> | 50234816 | 51303520 cache references +2.12%
>> | 5428258 | 5583728 cache misses +2.86%
>>
>
> Is this I or D, or combined?

That's last-level-cache references+misses (L2 cache):

Bit Position Event Name UMask Event Select
CPUID.AH.EBX
3 LLC Reference 4FH 2EH
4 LLC Misses 41H 2EH

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/