Re: [tip:perfcounters/core] perf_counter: x86: Fix call-chainsupport to use NMI-safe methods

From: Mathieu Desnoyers
Date: Fri Jun 19 2009 - 11:51:35 EST


* Ingo Molnar (mingo@xxxxxxx) wrote:
>
> * Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
>
> > On Mon, 15 Jun 2009, Ingo Molnar wrote:
> > >
> > > See the numbers in the other mail: about 33 million pagefaults
> > > happen in a typical kernel build - that's ~400K/sec - and that
> > > is not a particularly really pagefault-heavy workload.
> >
> > Did you do any function-level profiles?
> >
> > Last I looked at it, the real cost of page faults were all in the
> > memory copies and page clearing, and while it would be nice to
> > speed up the kernel entry and exit, the few tens of cycles we
> > might be able to get from there really aren't all that important.
>
> Yeah.
>
> Here's the function level profiles of a typical kernel build on a
> Nehalem box:
>
> $ perf report --sort symbol
>
> #
> # (14317328 samples)
> #
> # Overhead Symbol
> # ........ ......
> #
> 44.05% 0x000000001a0b80

It makes me wonder how the following scenario is accounted :

- Execution of a newly forked/exec'd process instruction causes a fault.
(traps, faults and interrupts can take roughly 2000 cycles to execute)
- PC sampling interrupt fires.

Will it account the execution time as part of user-space or
kernel-space execution ?

Depending on how the sampling mechanism finds out if it is running in
kernel mode or userspace mode, this might make the userspace PC appear
as currently running even though the current execution context is the
very beginning of the page fault handler (1st instruction servicing the
fault).

Mathieu

> 5.09% 0x0000000001d298
> 3.56% 0x0000000005742c
> 2.48% 0x0000000014026d
> 2.31% 0x00000000007b1a
> 2.06% 0x00000000115ac9
> 1.83% [.] _int_malloc
> 1.71% 0x00000000064680
> 1.50% [.] memset
> 1.37% 0x00000000125d88
> 1.28% 0x000000000b7642
> 1.17% [k] clear_page_c
> 0.87% [k] page_fault
> 0.78% [.] is_defined_config
> 0.71% [.] _int_free
> 0.68% [.] __GI_strlen
> 0.66% 0x000000000699e8
> 0.54% [.] __GI_memcpy
>
> Most is dominated by user-space symbols. (no proper ELF+debuginfo on
> this box so they are unnamed.) It also sows that page clearing and
> pagefault handling dominates the kernel overhead - but is dwarved by
> other overhead. Any page-fault-entry costs are a drop in the bucket.
>
> In fact with call-chain graphs we can get a precise picture, as we
> can do a non-linear 'slice' set operation over the samples and
> filter out the ones that have the 'page_fault' pattern in one of
> their parent functions:
>
> $ perf report --sort symbol --parent page_fault
>
> #
> # (14317328 samples)
> #
> # Overhead Symbol
> # ........ ......
> #
> 1.12% [k] clear_page_c
> 0.87% [k] page_fault
> 0.43% [k] get_page_from_freelist
> 0.25% [k] _spin_lock
> 0.24% [k] do_page_fault
> 0.23% [k] perf_swcounter_ctx_event
> 0.16% [k] perf_swcounter_event
> 0.15% [k] handle_mm_fault
> 0.15% [k] __alloc_pages_nodemask
> 0.14% [k] __rmqueue
> 0.12% [k] find_get_page
> 0.11% [k] copy_page_c
> 0.11% [k] find_vma
> 0.10% [k] _spin_lock_irqsave
> 0.10% [k] __wake_up_bit
> 0.09% [k] _spin_unlock_irqrestore
> 0.09% [k] do_anonymous_page
> 0.09% [k] __inc_zone_state
>
> This "sub-profile" shows the true summary overhead that 'page_fault'
> and all its child functions have. Note that for example clear_page_c
> decreased from 1.17% to 1.12%:
>
> 1.12% [k] clear_page_c
> 1.17% [k] clear_page_c
>
> because there's 0.05% of other callers to clear_page_c() that do not
> involve page_fault. Those are filtered out via --parent
> filtering/matching.
>
> Ingo

--
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/