Re: [RFC][PATCH] perf_events, x86: PEBS support

From: Stephane Eranian
Date: Wed Feb 03 2010 - 09:08:53 EST


On Wed, Feb 3, 2010 at 2:56 PM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> On Wed, 2010-02-03 at 14:22 +0100, Stephane Eranian wrote:
>> In general, there are some problems with the PEBS buffer when
>> used in system-wide mode. If the depth is > 1, then you have a
>> problem attributing samples to pid,tid.
>>
>> Looks like this patch hardcodes the depth and threshold of the buffer.
>> I believe you need to add some flexibility in there.
>
> Sure you can, just drain the buffers on context switch. (You'll see that
> placing x86_pmu.drain_pebs() calls is one of the missing pieces).

I was talking about cpu-wide mode, where you don't do anything today.
So sure, if you had drain_pebs() in the context switch out, then this will
work.


>
>> I do not believe substituting PEBS whenever you detect it is available AND
>> event supports it is a good idea. PEBS is not more precise than regular
>> sampling, in fact, it is statistically of poorer quality. This is due to the way
>> it works and it cannot be mitigated by randomization (at least with depth > 1).
>
> Right, which is why I already mentioned intending to use depth == 1 for
> things like the auto-freq (and possible future randomization).
>
okay.

>> The only improvement that PEBS provides is that you get an IP and the
>> machine state at retirement of an instruction that caused the event to
>> increment. Thus, the IP points to the next dynamic instruction. The instruction
>> is not the one that cause the P-th occurence of the event, if you set the
>> period to P. It is at P+N, where N cannot be predicted and varies depending
>> on the event and executed code. This introduces some bias in the samples.
>
> I'm not sure I follow, it records the next event after overflow, doesn't
> that make it P+1?
>
That is not what I wrote. I did not say if records at P+1. I said it records
at P+N, where N varies from sample to sample and cannot be predicted.
N is expressed in the unit of the sampling event.

> It doesn't matter how many instructions are between the P-th and P+1th
> event, you're counting events.
>
I did not talk about instructions but occurrences of the sampling event.

> One thing that is not quite clear to me is the influence of PEBS Trap,
> IA32_PERF_CAPABILITIES[6], that says to record after (trap like) when
> set, and before (fault like) when cleared, but then it goes on saying
> the IP is always the instruction after.

I have never played with Trap vs. Fault. I leave it to default.

The IP is ALWAYS the address after the sampled instruction because it
is recorded at retirement of that instruction. Same thing with the machine
state. It is the state after the instruction retired. So if it
increments a register,
you get the value after the increment.

>
> If it means the register state before or after the instruction, then I
> don't know why they had to mess up the IP like they do :/
>
>> Given the behavior of PEBS, it would not be possible to correlate samples
>> obtained from two events with only one of them supporting PEBS. For instance,
>> if you sample on INST_RETIRED and UNHALTED_CORE_CYCLES. You
>> would get a PEBS profile for INST_RETIRED and a regular profile for CYCLES.
>> Given the skid differences, you would not be able to make fair comparisons.
>
> OK, good point.
>
>
>



--
Stephane Eranian | EMEA Software Engineering
Google France | 38 avenue de l'OpÃra | 75002 Paris
Tel : +33 (0) 1 42 68 53 00
This email may be confidential or privileged. If you received this
communication by mistake, please
don't forward it to anyone else, please erase all copies and
attachments, and please let me know that
it went to the wrong person. Thanks
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/