Re: Does perf-annotate work correctly?

From: Du, Changbin
Date: Tue Sep 26 2017 - 02:13:04 EST



Hi Arnaldo and all,
Is there someone looking at this issue? I can provide more info if you need.
This issue makes it so inconvenient to use perf. Thanks.

On Wed, Sep 13, 2017 at 09:54:25AM +0800, Du, Changbin wrote:
> On Tue, Sep 12, 2017 at 11:33:50AM -0300, Arnaldo Carvalho de Melo wrote:
> > Em Tue, Sep 12, 2017 at 06:10:35PM +0800, Du, Changbin escreveu:
> > > When a annotate a symbol, I find the annotated C source code doesn't match assembly code.
> > > So I cannot determine which line of C code has much overhead withou gdb's help.
> > >
> > > Here is a example result of function apic_has_interrupt_for_ppr() in kvm module.
> >
> > Ok, was this using the module .ko file or /proc/kcore? You forgot to
> > cut'n'paste the first line on the screen.
> >
> It is arch/x86/kvm/kvm.ko.
>
> > Also, how did you use gdb?
> >
> $ gdb arch/x86/kvm/kvm.ko
> $ (gdb) disassemble /s apic_has_interrupt_for_ppr
>
> > perf uses objdump to do the disassembly, and depending on how it is used
> > (live system, post processing on a different machine, permissions) it
> > may use different files to do the disassembly.
> >
> But objdump has same out as gdb. (Always on same machine, and no binary changed.)
>
> $ objdump -d -S arch/x86/kvm/kvm.o
> ...
> static int apic_has_interrupt_for_ppr(struct kvm_lapic *apic, u32 ppr)
> {
> 3b4a0: e8 00 00 00 00 callq 3b4a5 <apic_has_interrupt_for_ppr+0x5>
> 3b4a5: 55 push %rbp
> 3b4a6: 48 89 e5 mov %rsp,%rbp
> 3b4a9: 48 83 ec 08 sub $0x8,%rsp
> int highest_irr;
> if (kvm_x86_ops->sync_pir_to_irr && apic->vcpu->arch.apicv_active)
> 3b4ad: 48 8b 05 00 00 00 00 mov 0x0(%rip),%rax # 3b4b4 <apic_has_interrupt_for_ppr+0x14>
> 3b4b4: 48 8b 80 38 02 00 00 mov 0x238(%rax),%rax
> 3b4bb: 48 85 c0 test %rax,%rax
> 3b4be: 74 10 je 3b4d0 <apic_has_interrupt_for_ppr+0x30>
> 3b4c0: 48 8b 97 88 00 00 00 mov 0x88(%rdi),%rdx
> 3b4c7: 80 ba 28 03 00 00 00 cmpb $0x0,0x328(%rdx)
> 3b4ce: 75 3a jne 3b50a <apic_has_interrupt_for_ppr+0x6a>
>
> /*
> * Note that irr_pending is just a hint. It will be always
> * true with virtual interrupt delivery enabled.
> */
> if (!apic->irr_pending)
> 3b4d0: 80 bf 91 00 00 00 00 cmpb $0x0,0x91(%rdi)
> 3b4d7: 74 2a je 3b503 <apic_has_interrupt_for_ppr+0x63>
> 3b4d9: 48 8b 8f a0 00 00 00 mov 0xa0(%rdi),%rcx
> static int find_highest_vector(void *bitmap)
> {
> int vec;
> u32 *reg;
>
> for (vec = MAX_APIC_VECTOR - APIC_VECTORS_PER_REG;
> 3b4e0: b8 e0 00 00 00 mov $0xe0,%eax
> vec >= 0; vec -= APIC_VECTORS_PER_REG) {
> reg = bitmap + REG_POS(vec);
> if (*reg)
> 3b4e5: 89 c2 mov %eax,%edx
> 3b4e7: c1 fa 05 sar $0x5,%edx
> 3b4ea: c1 e2 04 shl $0x4,%edx
> 3b4ed: 48 63 d2 movslq %edx,%rdx
> 3b4f0: 8b 94 11 00 02 00 00 mov 0x200(%rcx,%rdx,1),%edx
> 3b4f7: 85 d2 test %edx,%edx
> 3b4f9: 75 2d jne 3b528 <apic_has_interrupt_for_ppr+0x88>
>
>
>
> > Please provide more detailed information on the exact command line
> > arguments and usage scenario.
> >
> > - Arnaldo
>
> >
> > > â580 __clear_bit(KVM_APIC_PV_EOI_PENDING, &vcpu->arch.apic_attention); â
> > > â581 } â
> > > â â
> > > â583 static int apic_has_interrupt_for_ppr(struct kvm_lapic *apic, u32 ppr) â
> > > â584 { â
> > > 0.88 â30: cmpb $0x0,0x91(%rdi) â
> > > 2.54 â â je 63 â
> > > 0.20 â mov 0xa0(%rdi),%rcx â
> > > â581 int highest_irr; â
> > > â582 if (kvm_x86_ops->sync_pir_to_irr && apic->vcpu->arch.apicv_active) â
> > > 4.91 â mov $0xe0,%eax x â
> > > 1.46 â45: mov %eax,%edx x â
> > > 0.02 â sar $0x5,%edx x â
> > > 3.57 â shl $0x4,%edx x â
> > > 3.34 â movslq %edx,%rdx x â
> > > 1.25 â mov 0x200(%rcx,%rdx,1),%edx x â
> > > 42.44 â test %edx,%edx x â
> > > 0.01 â âââjne 88 x â
> > > 3.48 â â sub $0x20,%eax x â
> > > 2.24 â â cmp $0xffffffe0,%eax x â
> > > â586âapic_find_highest_irr(): â
> > > â â â
> > > â407â /* â
> > > â408â * Note that irr_pending is just a hint. It will be always â
> > > â409â * true with virtual interrupt delivery enabled. â
> > > â410â */ â
> > > â411â if (!apic->irr_pending) â
> > > â ââ jne 45 â
> > > 0.62 â63:â mov $0xffffffff,%eax â
> > > 0.83 â â leaveq â
> > > 13.52 â ââ retq â
> > > â6a:â mov %esi,-0x4(%rbp) â
> > > â â mov %rdx,%rdi â
> > > â418âfind_highest_vector(): â
> > > â340âstatic int find_highest_vector(void *bitmap) â
> > > â341â{ â
> > > â342â int vec; â
> > > â343â u32 *reg; â
> > > â â â
> > > â345â for (vec = MAX_APIC_VECTOR - APIC_VECTORS_PER_REG; â
> > > â ââ callq *%rax â
> > > â â mov -0x4(%rbp),%esi â
> > > â343â vec >= 0; vec -= APIC_VECTORS_PER_REG) { â
> > > â344â reg = bitmap + REG_POS(vec); â
> > > â345â if (*reg) â
> > > 0.05 â75:â cmp $0xffffffff,%eax â
> > > â ââ je 63 â
> > > 1.95 â â mov %eax,%edx â
> > > 1.45 â â and $0xf0,%edx
> > >
> > >
> > > Look at the assembly code block where I have put a 'x' on the right. Apparently the
> > > assembly code doesn't match the C source code arrounded. Let's look the correct disassemble
> > > result from gdb:
> > >
> > > 340 for (vec = MAX_APIC_VECTOR - APIC_VECTORS_PER_REG;
> > > 0x000000000003b4e0 <+64>: mov $0xe0,%eax
> > >
> > > 342 reg = bitmap + REG_POS(vec);
> > > 343 if (*reg)
> > > 0x000000000003b4e5 <+69>: mov %eax,%edx
> > > 0x000000000003b4e7 <+71>: sar $0x5,%edx
> > > 0x000000000003b4ea <+74>: shl $0x4,%edx
> > > 0x000000000003b4ed <+77>: movslq %edx,%rdx
> > > 0x000000000003b4f0 <+80>: mov 0x200(%rcx,%rdx,1),%edx
> > > 0x000000000003b4f7 <+87>: test %edx,%edx
> > > 0x000000000003b4f9 <+89>: jne 0x3b528 <apic_has_interrupt_for_ppr+136>
> > >
> > > 341 vec >= 0; vec -= APIC_VECTORS_PER_REG) {
> > > 0x000000000003b4fb <+91>: sub $0x20,%eax
> > >
> > > 340 for (vec = MAX_APIC_VECTOR - APIC_VECTORS_PER_REG;
> > > 0x000000000003b4fe <+94>: cmp $0xffffffe0,%eax
> > > 0x000000000003b501 <+97>: jne 0x3b4e5 <apic_has_interrupt_for_ppr+69>
> > >
> > >
> > > Compared to gdb, perf-annoate has messed up. is it a bug or just perf is not as perfect as gdb?
> > >
> > > --
> > > Thanks,
> > > Changbin Du
> >
> >
>
> --
> Thanks,
> Changbin Du



--
Thanks,
Changbin Du

Attachment: signature.asc
Description: PGP signature