Re: [PATCH 4/4] perf core: Add backward attribute to perf event

From: Wangnan (F)
Date: Tue Apr 05 2016 - 10:05:43 EST

Next message: Roger Quadros: "[PATCH v6 00/12] USB OTG/dual-role framework"
Previous message: Rob Clark: "Re: [RFC 1/6] drm/fence: add FENCE_FD property to planes"
Next in thread: Wangnan (F): "Re: [PATCH 4/4] perf core: Add backward attribute to perf event"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 2016/3/30 10:38, Wangnan (F) wrote:

On 2016/3/30 10:28, Wangnan (F) wrote:

On 2016/3/29 22:04, Peter Zijlstra wrote:

On Mon, Mar 28, 2016 at 06:41:32AM +0000, Wang Nan wrote:

Could you maybe write a perf/tests thingy for this so that _some_
userspace exists that exercises this new code?

int perf_output_begin(struct perf_output_handle *handle,
struct perf_event *event, unsigned int size)
{
+ if (unlikely(is_write_backward(event)))
+ return __perf_output_begin(handle, event, size, true);
return __perf_output_begin(handle, event, size, false);
}

Would something like:

int perf_output_begin(...)
{
if (unlikely(is_write_backward(event))
return perf_output_begin_backward(...);
return perf_output_begin_forward(...);
}

make sense; I'm not sure how much is still using this, but it seems
somewhat excessive to inline two copies of that thing into a single
function.

[SNIP]

Sorry. Your second suggestion seems also good:

My implementation makes a big perf_output_begin(), but introduces only one load and one branch.

Your first suggestion introduces one load, one branch and one function call.

Your second suggestion introduces one load, and at least one (and at most three) branches.

I need some benchmarking result.

Thank you.

No obviously performance divergence among all 3 implementations.

Here are some numbers:

I tested the cost of generating PERF_RECORD_COMM event using prctl with
following code:

...
gettimeofday(&tv1, NULL);
for (i = 0; i < 1000 * 1000 * 3; i++) {
char proc_name[10];

snprintf(proc_name, sizeof(proc_name), "p:%d\n", i);
prctl(PR_SET_NAME, proc_name);
}
gettimeofday(&tv2, NULL);
us1 = tv1.tv_sec * 1000000 + tv1.tv_usec;
us2 = tv2.tv_sec * 1000000 + tv2.tv_usec;
printf("%ld\n", us2 - us1);
...

Run this benchmark 100 time in each experiment. Bind benchmark to core 2
and perf to core 1 to ensure they are on a same CPU.

Result:

BASE : execute without perf
4.5 : pure v4.5
TIP : with only patch 1-3/4 in this patch set applied
BIGFUNC : the implementation in my original patch
FUNCCALL: the implememtation in Peter's first suggestion:
int perf_output_begin(...)
{
if (unlikely(is_write_backward(event))
return perf_output_begin_backward(...);
return perf_output_begin_forward(...);
}
BRANCH : the implememtation in Peter's second suggestion:
int perf_output_begin(...)
{
return __perf_output_begin(..., unlikely(event->attr.backwards));
}

'perf' is executed using:
# perf record -o /dev/null --no-buildid-cache -e syscalls:sys_enter_read ...

Results:

MEAN STDVAR
BASE : 1122968.85 33492.52
4.5 : 2714200.70 26231.69
TIP : 2646260.46 32610.56
BIGFUNC : 2661308.46 52707.47
FUNCCALL: 2636061.10 52607.80
BRANCH : 2651335.74 34910.04

Considering the stdvar, the performance result is nearly identical.

I'd like to choose 'BRANCH' because its code looks better.

Thank you.

Next message: Roger Quadros: "[PATCH v6 00/12] USB OTG/dual-role framework"
Previous message: Rob Clark: "Re: [RFC 1/6] drm/fence: add FENCE_FD property to planes"
Next in thread: Wangnan (F): "Re: [PATCH 4/4] perf core: Add backward attribute to perf event"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]