Re: [PATCHv2 00/18] perf tools: Factor ordered samples queue

From: David Ahern
Date: Wed Jun 18 2014 - 15:44:54 EST

Next message: Stephen Warren: "Re: [RFT 1/2] printk: make dynamic kernel ring buffer alignemnt explicit"
Previous message: Graham Williams: "[PATCH] regulator: bcm590xx: fix vbus name"
In reply to: Jiri Olsa: "[PATCH 02/18] perf tools: Fix accounting of ordered samples queue"
Next in thread: Jiri Olsa: "Re: [PATCHv2 00/18] perf tools: Factor ordered samples queue"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 6/18/14, 8:58 AM, Jiri Olsa wrote:

hi,
this patchset factors session's ordered samples queue,
and allows to limit the size of this queue.

v2 changes:
- several small changes for review comments (Namhyung)

The report command queues events till any of following
conditions is reached:
- PERF_RECORD_FINISHED_ROUND event is processed
- end of the file is reached

Any of above conditions will force the queue to flush some
events while keeping all allocated memory for next events.

If PERF_RECORD_FINISHED_ROUND is missing the queue will
allocate memory for every single event in the perf.data.
This could lead to enormous memory consuption and speed
degradation of report command for huge perf.data files.

With the quue allocation limit of 100 MB, I've got around
15% speedup on reporting of ~10GB perf.data file.

current code:
Performance counter stats for './perf.old report --stdio -i perf-test.data' (3 runs):

621,685,704,665 cycles ( +- 0.52% )
873,397,467,969 instructions ( +- 0.00% )

286.133268732 seconds time elapsed ( +- 1.13% )

with patches:
Performance counter stats for './perf report --stdio -i perf-test.data' (3 runs):

603,933,987,185 cycles ( +- 0.45% )
869,139,445,070 instructions ( +- 0.00% )

245.337510637 seconds time elapsed ( +- 0.49% )

The speed up seems to be mainly in less cycles spent in servicing
page faults:

current code:
4.44% 0.01% perf.old [kernel.kallsyms] [k] page_fault

with patches:
1.45% 0.00% perf [kernel.kallsyms] [k] page_fault

current code (faults event):
6,643,807 faults ( +- 0.36% )

with patches (faults event):
2,214,756 faults ( +- 3.03% )

Also now we have one of our big memory spender under control
and the ordered events queue code is put in separated object
with clear interface ready to be used by another command
like script.

Also reachable in here:
git://git.kernel.org/pub/scm/linux/kernel/git/jolsa/perf.git
perf/core_ordered_events

I've skimmed through the patches. What happens if you are in the middle of a round and the max queue size is reached?

I need to find some time for a detailed review, and to run through some stress case scenarios. e.g., a couple that come to mind
perf sched record -- perf bench sched pipe
perf kvm record while booting a nested VM which causes a lot of VMEXITs

David

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Stephen Warren: "Re: [RFT 1/2] printk: make dynamic kernel ring buffer alignemnt explicit"
Previous message: Graham Williams: "[PATCH] regulator: bcm590xx: fix vbus name"
In reply to: Jiri Olsa: "[PATCH 02/18] perf tools: Fix accounting of ordered samples queue"
Next in thread: Jiri Olsa: "Re: [PATCHv2 00/18] perf tools: Factor ordered samples queue"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]