Re: [PATCH v1 0/2] perf: Drop leaked kernel samples

From: Kyle Huey
Date: Fri Jun 15 2018 - 13:16:49 EST


On Thu, Jun 14, 2018 at 10:11 PM, Jin, Yao <yao.jin@xxxxxxxxxxxxxxx> wrote:
>
>
> On 6/15/2018 11:35 AM, Kyle Huey wrote:
>>
>> I strongly object to this patch as written. As I said when I
>> originally reported[0] the regression introduced by the previous
>> version of this patch a year ago.
>>
>> "It seems like this change should, at a bare minimum, be limited to
>> counters that actually perform sampling of register state when the
>> interrupt fires. In our case, with the retired conditional branches
>> counter restricted to counting userspace events only, it makes no
>> difference that the PMU interrupt happened to be delivered in the
>> kernel."
>>
>> This means identifying which values of `perf_event_attr::sample_type`
>> are security concerns (presumably PERF_SAMPLE_IP is, and
>> PERF_SAMPLE_TIME is not, and someone needs to go through and decide on
>> all of them) and filtering on those values for this new behavior.
>>
>> And because rr sets its sample_type to 0, once you do that, the sysctl
>> will not be necessary.
>>
>> - Kyle
>>
>
> Since rr sets sample_type to 0, the easiest way is to add checking like:
>
> if (event->attr.sample_type) {
> if (event->attr.exclude_kernel && !user_mode(regs))
> return false;
> }
>
> So the rr doesn't need to be changed and for other use cases the leaked
> kernel samples will be dropped.
>
> But I don't like this is because:
>
> 1. It's too specific for rr case.

Keeping existing software working is the first rule of kernel development!

There is no disclosure of kernel space state in the way rr uses this
API, so there is no reason that this API should not keep working.

> 2. If we create a new sample_type, e.g. PERF_SAMPLE_ALLOW_LEAKAGE, the code
> will be:
>
> if !(event->attr.sample_type & PERF_SAMPLE_ALLOW_LEAKAGE) {
> if (event->attr.exclude_kernel && !user_mode(regs))
> return false;
> }
>
> But rr needs to add PERF_SAMPLE_ALLOW_LEAKAGE to sample_type since by
> default the bit is not set.

There's no reason to add a new PERF_SAMPLE flag. You need to audit the
*existing* PERF_SAMPLE flags and figure out which ones are problems,
and then do

if (event->attr.exclude_kernel && !user_mode(regs) &&
sampling_discloses_kernel_information(event->attr.sample_type)) {
return false;
}

> 3. Sysctl is a more flexible way. It provides us with an option when we want
> to see if skid is existing, we can use sysctl to turn on that.

If you want a sysctl for your own reasons that's fine. But we don't
want a sysctl. We want to work without any further configuration.

> Thanks
> Jin Yao
>

- Kyle