Re: [PATCH v3] perf/core: Set event shadow time for inactive events too

From: Peter Zijlstra
Date: Thu Dec 09 2021 - 03:21:52 EST


On Wed, Dec 08, 2021 at 09:52:16PM -0800, Namhyung Kim wrote:
> Hi Peter,
>
> On Wed, Dec 8, 2021 at 3:22 PM Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> >
> > On Sun, Dec 05, 2021 at 02:48:43PM -0800, Namhyung Kim wrote:
> > > While commit f79256532682 ("perf/core: fix userpage->time_enabled of
> > > inactive events") fixed this problem for user rdpmc usage,
> >
> > You're referring to 'this problem' before actually describing a problem :-(
>
> Well, it's a problem of reporting incorrect 'enabled' time.
> I'm sorry if it was not clear.
>
> >
> > Also, you now have me looking at that commit again, and I'm still hating
> > it. Also, I'm again struggling to make sense of it; all except the very
> > last hunk that is.
> >
> > So the whole, full-fat, mmap self-monitor thing looks like:
> >
> >
> > u32 seq, time_mult, time_shift, index, width = 64;
> > u64 count, enabled, running;
> > u64 cyc, time_offset, time_cycles = 0, time_mask = ~0ULL;
> > u64 quot, rem, delta;
> > s64 pmc = 0;
> >
> > do {
> > seq = pc->lock;
> > barrier();
> >
> > enabled = pc->time_enabled;
> > running = pc->time_running;
> >
> > if (pc->cap_user_time && enabled != running) {
> > cyc = rdtsc();
> > time_offset = pc->time_offset;
> > time_mult = pc->time_mult;
> > time_shift = pc->time_shift;
> > }
> >
> > if (pc->cap_user_time_short) {
> > time_cycles = pc->time_cycles;
> > time_mask = pc->time_mask;
> > }
> >
> > index = pc->index;
> > count = pc->offset;
> > if (pc->cap_user_rdpmc && index) {
> > width = pc->pmc_width;
> > pmc = rdpmc(index - 1);
> > }
> >
> > barrier();
> > } while (pc->lock != seq);
> >
> > if (width < 64) {
> > pmc <<= 64 - width;
> > pmc >>= 64 - width;
> > }
> > count += pmc;
> >
> > cyc = time_cycles + ((cyc - time_cycles) & time_mask);
> >
> > quot = (cyc >> time_shift);
> > rem = cyc & ((1ULL < time_shift) - 1);
> > delta = time_offset + quot * time_mult +
> > ((rem * time_mult) >> time_shift);
> >
> > enabled += delta;
> > if (index)
> > running += delta;
> >
> > quot = count / running;
> > rem = count % running;
> > count = quot * enabled + (rem * enabled) / running;
> >
> >
> > Now, the thing that sticks out to me is that 'enabled' is
> > unconditionally advanced. It *always* runs.
> >
> > So how can not updating ->time_enabled when the counter is INACTIVE due
> > to rotation (which causes ->index == 0), cause enabled to not be
> > up-to-date?
>
> Hmm.. I don't get it. In my understanding, that's the whole point
> of the enabled time - tracking time it was not active due to the
> multiplexing (rotation). So that users might want to scale the
> count based on the ratio of running vs enabled.

Correct, and AFAICT that works as advertised.

> Do I miss something?

Where do we actually need the crap that is commit f79256532682 ?