Re: [REGRESSION] perf/core: PMU interrupts dropped if we entered the kernel in the "skid" region

From: Jin, Yao
Date: Tue Jun 27 2017 - 22:15:07 EST


Hi,

In theory, the PMI interrupts in skid region should be dropped, right?

For a userspace debugger, is it the only choice that relies on the *skid* PMI interrupt?

Thanks
Jin Yao

On 6/28/2017 9:01 AM, Kyle Huey wrote:
Sent again with LKML CCd, sorry for the noise.

- Kyle

On Tue, Jun 27, 2017 at 5:38 PM, Kyle Huey <me@xxxxxxxxxxxx> wrote:
cc1582c231ea introduced a regression in v4.12.0-rc5, and appears to be
a candidate for backporting to stable branches.

rr, a userspace record and replay debugger[0], uses the PMU interrupt
to stop a program during replay to inject asynchronous events such as
signals. We are counting retired conditional branches in userspace
only. This changeset causes the kernel to drop interrupts on the
floor if, during the PMU interrupt's "skid" region, the CPU enters
kernel mode for whatever reason. When replaying traces of complex
programs such as Firefox, we intermittently fail to deliver
asynchronous events on time, leading the replay to diverge from the
recorded state.

It seems like this change should, at a bare minimum, be limited to
counters that actually perform sampling of register state when the
interrupt fires. In our case, with the retired conditional branches
counter restricted to counting userspace events only, it makes no
difference that the PMU interrupt happened to be delivered in the
kernel.

As this makes rr unusable on complex applications and cannot be
efficiently worked around, we would appreciate this being addressed
before 4.12 is finalized, and the regression not being introduced to
stable branches.

Thanks,

- Kyle

[0] http://rr-project.org/