Re: [PATCH v2] perf/core: Drop kernel samples even though :u is specified

From: Mark Rutland
Date: Tue May 23 2017 - 06:59:07 EST


Hi,

On Tue, May 23, 2017 at 06:16:10PM +0800, Jin Yao wrote:
> When doing sampling, for example,
>
> perf record -e cycles:u ...
>
> On workloads that do a lot of kernel entry/exits we see kernel
> samples, even though :u is specified. This is due to skid existing.
>
> This is a security issue because it can leak kernel addresses even
> though kernel sampling support is disabled.
>
> The patch drops the kernel samples if exclude_kernel is specified.

[...]

> +static bool skid_kernel_samples(struct perf_event *event, struct pt_regs *regs)

The name is a bit opaque, especially where it is used in
__perf_event_overflow().

How about we invert the polarity and call this sample_is_allowed() ?

> +{
> + /*
> + * We may get kernel samples even though exclude_kernel
> + * is specified due to potential skid in sampling.
> + * The skid kernel samples could be dropped or just do
> + * nothing by testing the flag PERF_PMU_CAP_NO_SKID.
> + */
> + if (event->pmu->capabilities & PERF_PMU_CAP_NO_SKID)
> + return false;

Do we need this new cap?

I'd expect user_mode(regs) to be about as cheap as testing the cap, and
the common case is going to be that we we have test both.

For those PMUs without skid, when not sampling the kernel,
user_mode(regs) should always be true.

IMO, it would make more sense to just check user_mode(regs), which also
avoids any surprises with unexpected skid...

> +
> + if (event->attr.exclude_kernel &&
> + !user_mode(regs) &&
> + (event->attr.sample_type & PERF_SAMPLE_IP)) {
> + return true;
> + }
> +
> + return false;
> +}

How about:

static bool sample_is_allowed(struct perf_event *event, struct pt_regs *regs)
{
/*
* Due to interrupt latency (AKA "skid"), we may enter the
* kernel before taking an overflow, even if the PMU is only
* counting user events.
*
* To avoid leaking information to userspace, we must always
* reject kernel samples when exclude_kernel is set.
*/
if (!user_mode(regs) && event->attr.exclude_kernel &&
(event->attr.sample_type & PERF_SAMPLE_IP))
return false;

return true;
}

... do we need to reject any other sample types, or do we definitely
avoid leaks by other means?

> +
> /*
> * Generic event overflow handling, sampling.
> */
> @@ -7337,6 +7357,12 @@ static int __perf_event_overflow(struct perf_event *event,
> ret = __perf_event_account_interrupt(event, throttle);
>
> /*
> + * For security, drop the skid kernel samples if necessary.
> + */
> + if (skid_kernel_samples(event, regs))
> + return ret;
> +

.. with the above changes, this can be:

if (!sample_is_allowed(event, regs))
return ret;

Thanks,
Mark.