Re: [PATCH] kprobes - do not allow optimized kprobes in entry code

From: Masami Hiramatsu
Date: Sat Feb 19 2011 - 09:14:51 EST


(2011/02/19 1:26), Jiri Olsa wrote:
[...]
>> The only worry would be that if we move the syscall entry code out of the regular
>> text section fragments the icache layout a tiny bit, possibly hurting performance.
>>
>> It's probably not measurable, but we need to measure it:
>>
>> Testing could be done of some syscall but also cache-intense workload, like
>> 'hackbench 10', via perf 'stat --repeat 30' and have a very close look at
>> instruction cache eviction differences.
>>
>> Perhaps also explicitly enable measure one of these:
>>
>> L1-icache-loads [Hardware cache event]
>> L1-icache-load-misses [Hardware cache event]
>> L1-icache-prefetches [Hardware cache event]
>> L1-icache-prefetch-misses [Hardware cache event]
>>
>> iTLB-loads [Hardware cache event]
>> iTLB-load-misses [Hardware cache event]
>>
>> to see whether there's any statistically significant difference in icache/iTLB
>> evictions, with and without the patch.
>>
>> If such stats are included in the changelog - even if just to show that any change
>> is within measurement accuracy, it would make it easier to apply this change.
>>
>> Thanks,
>>
>> Ingo
>
>
> hi,
>
> I have some results, but need help with interpretation.. ;)
>
> I ran following command (with repeat 100 and 500)
>
> perf stat --repeat 100 -e L1-icache-load -e L1-icache-load-misses -e
> L1-icache-prefetches -e L1-icache-prefetch-misses -e iTLB-loads -e
> iTLB-load-misses ./hackbench/hackbench 10
>
> I can tell just the obvious:
> - the cache load count is higher for the patched kernel,
> but the cache misses count is lower
> - patched kernel has also lower count of prefetches,
> other counts are bigger for patched kernel
>
> there's still some variability in counter values each time I run the perf

Thanks, I've also tested. (But my machine has no L1-icache-prefetches* support)
What I can tell is both of L1-icache-load and L1-icache-load-misses is
reduced by the patch. ;-)

Thank you,
--------------------------------------------------------------------------
the results for current tip tree are:

$ ./perf stat --repeat 100 -e L1-icache-load -e L1-icache-
load-misses -e iTLB-loads -e iTLB-load-misses hackbench 10

Performance counter stats for 'hackbench 10' (100 runs):

16,949,055 L1-icache-load ( +- 0.303% )
1,237,453 L1-icache-load-misses ( +- 0.254% )
40,000,357 iTLB-loads ( +- 0.257% )
14,545 iTLB-load-misses ( +- 0.306% )

0.171622060 seconds time elapsed ( +- 0.196% )

$ ./perf stat --repeat 500 -e L1-icache-load -e L1-icache-
load-misses -e iTLB-loads -e iTLB-load-misses hackbench 10

Performance counter stats for 'hackbench 10' (500 runs):

16,896,081 L1-icache-load ( +- 0.146% )
1,234,272 L1-icache-load-misses ( +- 0.105% )
39,850,899 iTLB-loads ( +- 0.116% )
14,455 iTLB-load-misses ( +- 0.119% )

0.171901412 seconds time elapsed ( +- 0.083% )

--------------------------------------------------------------------------
the results for tip tree with the patch applied are:

$ ./perf stat --repeat 100 -e L1-icache-load -e L1-icache-
load-misses -e iTLB-loads -e iTLB-load-misses hackbench 10

Performance counter stats for 'hackbench 10' (100 runs):

16,819,190 L1-icache-load ( +- 0.288% )
1,162,386 L1-icache-load-misses ( +- 0.269% )
40,020,154 iTLB-loads ( +- 0.254% )
14,440 iTLB-load-misses ( +- 0.220% )

0.169014989 seconds time elapsed ( +- 0.361% )

$ ./perf stat --repeat 500 -e L1-icache-load -e L1-icache-
load-misses -e iTLB-loads -e iTLB-load-misses hackbench 10

Performance counter stats for 'hackbench 10' (500 runs):

16,783,970 L1-icache-load ( +- 0.144% )
1,155,816 L1-icache-load-misses ( +- 0.113% )
39,958,292 iTLB-loads ( +- 0.122% )
14,462 iTLB-load-misses ( +- 0.138% )

0.168279115 seconds time elapsed ( +- 0.089% )


--------------------------------------------------------------------------
Here is an entry of the /proc/cpuinfo.

processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 26
model name : Intel(R) Core(TM) i7 CPU 920 @ 2.67GHz
stepping : 4
cpu MHz : 2673.700
cache size : 8192 KB
physical id : 0
siblings : 8
core id : 0
cpu cores : 4
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 11
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm
constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc
aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm sse4_1
sse4_2 popcnt lahf_lm ida dts tpr_shadow vnmi flexpriority ept vpid
bogomips : 5347.40
clflush size : 64
cache_alignment : 64
address sizes : 36 bits physical, 48 bits virtual
power management:

--
Masami HIRAMATSU
2nd Dept. Linux Technology Center
Hitachi, Ltd., Systems Development Laboratory
E-mail: masami.hiramatsu.pt@xxxxxxxxxxx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/