Re: [PATCH v1 1/2] perf/core: Use sysctl to turn on/off dropping leaked kernel samples

From: Mark Rutland
Date: Fri Jun 15 2018 - 07:36:19 EST


On Fri, Jun 15, 2018 at 06:03:22PM +0800, Jin Yao wrote:
> When doing sampling, for example:
>
> perf record -e cycles:u ...
>
> On workloads that do a lot of kernel entry/exits we see kernel
> samples, even though :u is specified. This is due to skid existing.
>
> This might be a security issue because it can leak kernel addresses even
> though kernel sampling support is disabled.
>
> One patch "perf/core: Drop kernel samples even though :u is specified"
> was posted in last year but it was reverted because it introduced a
> regression issue that broke the rr-project, which used sampling
> events to receive a signal on overflow. These signals were critical
> to the correct operation of rr.
>
> See '6a8a75f32357 ("Revert "perf/core: Drop kernel samples even
> though :u is specified"")' for detail.
>
> Now the idea is to use sysctl to control the dropping of leaked
> kernel samples.
>
> /sys/devices/cpu/perf_allow_sample_leakage:
>
> 0 - default, drop the leaked kernel samples.
> 1 - don't drop the leaked kernel samples.

Does this need to be conditional at all?

At least for sampling the GPRs, we could do something like below
unconditionally, which seems sufficient for my test cases.

Mark.

---->8----
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 67612ce359ad..79a21531d57c 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -6359,6 +6359,24 @@ perf_callchain(struct perf_event *event, struct pt_regs *regs)
return callchain ?: &__empty_callchain;
}

+static struct pt_regs *perf_get_sample_regs(struct perf_event *event, struct pt_regs *regs)
+{
+ /*
+ * Due to interrupt latency (AKA "skid"), we may enter the kernel
+ * before taking an overflow, even if the PMU is only counting user
+ * events.
+ *
+ * If we're not counting kernel events, always use the user regs when
+ * sampling.
+ *
+ * TODO: how does this interact with guest sampling?
+ */
+ if (event->attr.exclude_kernel && !user_mode(regs))
+ return task_pt_regs(current);
+
+ return regs;
+}
+
void perf_prepare_sample(struct perf_event_header *header,
struct perf_sample_data *data,
struct perf_event *event,
@@ -6366,6 +6384,8 @@ void perf_prepare_sample(struct perf_event_header *header,
{
u64 sample_type = event->attr.sample_type;

+ regs = perf_get_sample_regs(event, regs);
+
header->type = PERF_RECORD_SAMPLE;
header->size = sizeof(*header) + event->header_size;