Re: [patch] perf_event_open.2: 3.19 PERF_SAMPLE_REGS_INTR support

From: Stephane Eranian
Date: Fri Mar 06 2015 - 14:52:00 EST


On Fri, Mar 6, 2015 at 1:37 PM, Vince Weaver <vincent.weaver@xxxxxxxxx> wrote:
> On Mon, 2 Mar 2015, Andi Kleen wrote:
>
>> > do not enable REGS_USER and REG_INTR at the same time
>> > as REGS_USER will have REG_INTR values and
>> > cannot be used for user stack unwinding
>>
>> If that's true it would be a bug. But I doubt it.
>>
>> The PEBS handler sets up its own pt_regs, so they should
>> be independent.
>
> I could be wrong here, but was tracing through the code.
>
> If you trigger a PEBS interrupt (because you have precise_ip set)
> and you have both REGS_USER and REGS_INTR set, then
> __intel_pmu_pebs_event()
> is called from
> arch/x86/kernel/cpu/perf_event_intel_ds.c
>
> and in there it sets the regs values based solely on
>
> if (sample_type & PERF_SAMPLE_REGS_INTR) {
> }
>
> with those values copies into regs and then passed upstream through
> perf_event_overflow()
>
> so if the sample_type has *both* PERF_SAMPLE_REGS_INTR and
> PERF_SAMPLE_REGS_USER set, then the PERF_SAMPLE_REGS_USER values
> will have the same register values as the PERF_SAMPLE_REGS_INTR values.
>
> Maybe this is the expected behavior, or maybe I am missing something
> still.
>
If you look at perf_sample_regs_user() is has 3 pt_regs. If interrupt occurred
while in user mode, then regs_users get regs. And those could have been updated
by PEBS if REGS_INTR is set. The question is: is this valid?
If PEBS is one entry, then you'd get the state at retirement of the
sampled instruction.

The interrupt would come a bit later. the pt_regs reflects user mode,
thus either the
sampled instruction was still in user mode or it was in kernel mode.
In the later case,
this is a problem because you are reporting kernel state for REG_USER.
In the former
case, you'd report state for an instruction that is retired early that
where the interrupt hit.

It boils down to the definition of REGS_USER? Is that last know user
level state, interrupted
user state?

For REGS_INTR:
- precise_ip = 0: machine state at PMU interrupt
- precise_ip > 0: machine state at retirement of PEBS sampled instruction

For REGS_USER:
- precise_ip = 0: last known user level machine state on PMU interrupt
- precise_ip > 0:
- interrupt hit in user space: machine state at retirement of
PEBS sampled instruction
- interrupt hit in kernel space: last known user level machine
state on PMU interrupt

At least, that's how I think it currently works.
Do you agree, Vince?



> Vince
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/