Re: [PATCH] perf/core: generate overflow signal when samples are dropped (WAS: Re: [REGRESSION] perf/core: PMU interrupts dropped if we entered the kernel in the "skid" region)

From: Jin, Yao
Date: Wed Jun 28 2017 - 20:28:25 EST




On 6/29/2017 6:55 AM, Kyle Huey wrote:
On Wed, Jun 28, 2017 at 10:49 AM, Mark Rutland <mark.rutland@xxxxxxx> wrote:
On Wed, Jun 28, 2017 at 09:48:27AM -0700, Kyle Huey wrote:
On Wed, Jun 28, 2017 at 3:56 AM, Mark Rutland <mark.rutland@xxxxxxx> wrote:
@@ -6101,6 +6116,12 @@ void perf_prepare_sample(struct perf_event_header *header,
struct perf_output_handle handle;
struct perf_event_header header;

+ /*
+ * For security, drop the skid kernel samples if necessary.
+ */
+ if (!sample_is_allowed(event, regs))
+ return ret;
Just a bare return here.
Ugh, yes. Sorry about that. I'll fix that up.

[...]

I can confirm that with that fixed to compile, this patch fixes rr.
Thanks for giving this a go.

Having thought about this some more, I think Vince does make a good
point that throwing away samples is liable to break stuff, e.g. that
which only relies on (non-sensitive) samples.

It still seems wrong to make up data, though.

Maybe for exclude_kernel && !exclude_user events we can always generate
samples from the user regs, rather than the exception regs. That's going
to be closer to what the user wants, regardless. I'll take a look
tomorrow.
I'm not very familiar with the kernel internals, but the reason I
didn't suggest this originally is it seems like it will be difficult
to determine what the "correct" userspace registers are. For example,
what happens if a performance counter is fixed to a given tid, the
interrupt fires during a context switch from that task to another that
is not being monitored, and the kernel is far enough along in the
context switch that the current task struct has been switched out?
Reporting the new task's registers seems as bad as reporting the
kernel's registers. But maybe this is easier than I imagine for
whatever reason.

Something to think about.

- Kyle

Yes, I think so.

The skid interrupt may be triggered at a wrong context and return wrong indications (e.g. wrong regs) to userspace.

So that's why I think the *skid* interrupt had better be dropped.

Thanks
Jin Yao