Re: [PATCH] perf test x86: address multiplexing in rdpmc test

From: Ian Rogers
Date: Sat Mar 21 2020 - 13:45:06 EST


On Sat, Mar 21, 2020 at 6:41 AM Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
>
> On Fri, Mar 20, 2020 at 05:14:00PM -0700, Ian Rogers wrote:
> > Counters may be being used for pinned or other events which inhibit the
> > instruction counter in the test from being scheduled - time_enabled > 0
> > but time_running == 0. This causes the test to fail with division by 0.
> > Add a sleep loop to ensure that the counter is run before computing
> > the count.
> >
> > Signed-off-by: Ian Rogers <irogers@xxxxxxxxxx>
> > ---
> > tools/perf/arch/x86/tests/rdpmc.c | 45 ++++++++++++++++++++++++-------
> > 1 file changed, 35 insertions(+), 10 deletions(-)
> >
> > diff --git a/tools/perf/arch/x86/tests/rdpmc.c b/tools/perf/arch/x86/tests/rdpmc.c
> > index 1ea916656a2d..0b0792ae67f7 100644
> > --- a/tools/perf/arch/x86/tests/rdpmc.c
> > +++ b/tools/perf/arch/x86/tests/rdpmc.c
> > @@ -34,19 +34,35 @@ static u64 rdtsc(void)
> > return low | ((u64)high) << 32;
> > }
> >
> > -static u64 mmap_read_self(void *addr)
> > +static u64 mmap_read_self(void *addr, bool *error)
> > {
> > struct perf_event_mmap_page *pc = addr;
> > - u32 seq, idx, time_mult = 0, time_shift = 0;
> > + u32 seq, idx, time_mult = 0, time_shift = 0, sleep_count = 0;
> > u64 count, cyc = 0, time_offset = 0, enabled, running, delta;
> >
> > + *error = false;
> > do {
> > - seq = pc->lock;
> > - barrier();
> > -
> > - enabled = pc->time_enabled;
> > - running = pc->time_running;
> > -
> > + do {
> > + seq = pc->lock;
> > + barrier();
> > +
> > + enabled = pc->time_enabled;
> > + running = pc->time_running;
> > +
> > + if (running == 0) {
>
> This is not in fact the condition the Changelog calls out.

Not sure I follow. As in the multiplexing? It is exactly the condition
that time_running == 0.

> > + /*
> > + * Event hasn't run, this may be caused by
> > + * multiplexing.
> > + */
> > + sleep_count++;
> > + if (sleep_count > 60) {
> > + pr_err("Event failed to run. Are higher priority counters being sampled by a different process?\n");
> > + *error = true;
> > + return 0;
> > + }
> > + sleep(1);
> > + }
> > + } while (running == 0);
>
>
> Also, I would much prefer this test to be in the caller of this
> function, and not deface this function.
>
> I'd prefer this function to stay representative of the outlines in
> uapi/linux/perf_event.h and an example of how to actually use it.

I 100% agree the code here should line up with
uapi/linux/perf_event.h, there are already small divergences. I've
tried to address these comments in v2 here:
https://lkml.org/lkml/2020/3/21/325
Of course, I don't mind just being the person reporting this issue and
someone with a stronger opinion on the API to propose something better
and line up the perf_event.h comments. Another option is to fail the
test if the counter never runs rather than sleeping. That would be
unfortunate for some of the testing conditions we have, and we'd have
to run the test less frequently.

Thanks,
Ian

> > if (enabled != running) {
> > cyc = rdtsc();
> > time_mult = pc->time_mult;
> > @@ -131,13 +147,22 @@ static int __test__rdpmc(void)
> >
> > for (n = 0; n < 6; n++) {
> > u64 stamp, now, delta;
> > + bool error;
> >
> > - stamp = mmap_read_self(addr);
> > + stamp = mmap_read_self(addr, &error);
> > + if (error) {
> > + delta_sum = 0;
> > + goto out_close;
> > + }
> >
> > for (i = 0; i < loops; i++)
> > tmp++;
> >
> > - now = mmap_read_self(addr);
> > + now = mmap_read_self(addr, &error);
> > + if (error) {
> > + delta_sum = 0;
> > + goto out_close;
> > + }
> > loops *= 10;
> >
> > delta = now - stamp;
> > --
> > 2.25.1.696.g5e7596f4ac-goog
> >