Re: [RFC] perf/x86: Only expose userspace rdpmc for events on current CPU

From: Rob Herring
Date: Tue Jan 12 2021 - 17:00:18 EST


On Tue, Jan 12, 2021 at 11:05 AM Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
>
> On Tue, Jan 12, 2021 at 10:16:50AM -0600, Rob Herring wrote:
> > On Tue, Jan 12, 2021 at 9:33 AM Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> > >
> > > On Thu, Jan 07, 2021 at 05:01:36PM -0700, Rob Herring wrote:
> > > > Userspace access using rdpmc only makes sense if the event is valid for
> > > > the current CPU. However, cap_user_rdpmc is currently set no matter which
> > > > CPU the event is associated with. The result is userspace reading another
> > > > CPU's event thinks it can use rdpmc to read the counter. In doing so, the
> > > > wrong counter will be read.
> > >
> > > Don't do that then?
> >
> > I could check this in userspace I suppose, but then it's yet another
> > thing the rdpmc loop has to check. I think it's better to not add more
> > overhead there.
>
> So all this was designed for self monitoring; attempting rdpmc on an
> event not for yourself is out of spec.
>
> > > > diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
> > > > index a88c94d65693..6e6d4c1d03ca 100644
> > > > --- a/arch/x86/events/core.c
> > > > +++ b/arch/x86/events/core.c
> > > > @@ -2490,7 +2490,8 @@ void arch_perf_update_userpage(struct perf_event *event,
> > > > userpg->cap_user_time = 0;
> > > > userpg->cap_user_time_zero = 0;
> > > > userpg->cap_user_rdpmc =
> > > > - !!(event->hw.flags & PERF_X86_EVENT_RDPMC_ALLOWED);
> > > > + !!(event->hw.flags & PERF_X86_EVENT_RDPMC_ALLOWED) &&
> > > > + (event->oncpu == smp_processor_id());
> > > > userpg->pmc_width = x86_pmu.cntval_bits;
> > > >
> > > > if (!using_native_sched_clock() || !sched_clock_stable())
> > >
> > > Isn't that a nop? That is, from the few sites I checked, we're always
> > > calling this on the event's CPU.
> >
> > If cpu0 opens and mmaps an event for cpu1, then cpu0 will see
> > cap_user_rdpmc set and think it can use rdpmc.
>
> I don't think your check helps with that. IIRC we always call
> arch_perf_update_userpage() on the CPU the event actually runs on. So
> it's always true.

My testing says otherwise. I tested this change on the arm64 version
of arch_perf_update_userpage, but I don't think x86 should be any
different here.

I'm testing with libperf test_stat_cpu() modified to mmap each cpu
event. Without the change I get the following result:

# taskset 2 test-evsel-a -v
- running test-evsel.c...
mmap base 0xffff9fd77000
userspace counter access enabled on cpu0 <<<<< Reflects
cap_user_rdpmc state
cpu0: count = 0x72cf, ena = 0x1a838, run = 0x1a838 <<<<< count is from rdpmc
mmap base 0xffff9fd76000
userspace counter access enabled on cpu1
cpu1: count = 0xc978, ena = 0x163e6, run = 0x163e6 <<<<< count is from rdpmc

cpu0 is idle here, so we'd expect count to be near zero, but it's not.

Then with the change, I get the following:

# taskset 2 test-evsel-a -v
- running test-evsel.c...
mmap base 0xffffa742d000
cpu0: count = 0xddb, ena = 0x3f8d6, run = 0x3f8d6 <<<<< count is from read()
mmap base 0xffffa742c000
userspace counter access enabled on cpu1
cpu1: count = 0xb538, ena = 0x154f0, run = 0x154f0 <<<<< count is from rdpmc

# taskset 1 test-evsel-a -v
- running test-evsel.c...
mmap base 0xffff8b008000
userspace counter access enabled on cpu0
cpu0: count = 0x7c21, ena = 0x18574, run = 0x18574
mmap base 0xffff8b007000
cpu1: count = 0xb3c, ena = 0x61aa8, run = 0x61aa8

As you can see, count tracks the idle and not idle cpu, and
cap_user_rdpmc is only set for the cpu event matching the cpu we are
running on.

Rob