Re: [PATCH v1] perf evlist: Avoid frequency mode for the dummy event

From: Mingwei Zhang
Date: Tue Oct 03 2023 - 19:20:04 EST


On Tue, Oct 3, 2023 at 3:36 PM Ian Rogers <irogers@xxxxxxxxxx> wrote:
>
> On Tue, Oct 3, 2023 at 1:08 PM Namhyung Kim <namhyung@xxxxxxxxxx> wrote:
> >
> > Hello,
> >
> > On Wed, Sep 20, 2023 at 10:05 PM Mingwei Zhang <mizhang@xxxxxxxxxx> wrote:
> > >
> > > On Mon, Sep 18, 2023 at 3:43 PM Ian Rogers <irogers@xxxxxxxxxx> wrote:
> > > >
> > > > On Sat, Sep 16, 2023 at 5:46 PM Mingwei Zhang <mizhang@xxxxxxxxxx> wrote:
> > > > > Thank you very much for the change. I have one quick question about
> > > > > the PMU unthrottling logic. When I am looking into the function
> > > > > perf_adjust_freq_unthr_context(), I see the loop with PMU stop and
> > > > > start in each iteration. Is there a good way to avoid this PMU reset
> > > > > operation while quickly figuring out the event in frequency mode?
> > > >
> > > > Agreed. I think before the pmu_disable could be avoided for this condition:
> > > > ```
> > > > if (event->hw.interrupts != MAX_INTERRUPTS &&
> > > > (!event->attr.freq || !event->attr.sample_freq))
> > > > continue;
> > > > ```
> > > > Fixing up the event stop/start looks harder.
> > > >
> > >
> > > Right, I think putting the check early before pmu_disable() is already
> > > a great optimization. The only concern I initially had was whether
> > > event->hw.interrupts can be accessed before we disable the pmu. But
> > > after checking this field in other locations, I don't see any problem
> > > at all.
> >
> > The event->hw.interrupts would be increased in the NMI handler
> > so there is a race between the check and the NMI. That's why
> > I think it checks that after disabling the PMU.
> >
> > But I think we can skip non-sampling events for sure. Then it
> > would be better to set attr.sample_period = 0 rather than attr.freq.
> >
> > if (!is_sampling_event(event))
> > continue;
> >
> > perf_pmu_disable(event->pmu);
> > ...
> >
> > Thanks,
> > Namhyung
>
> With the PMU disabled, isn't there still a risk of an interrupt still
> being in flight? In other words the disable doesn't prevent a race and
> we'll catch this on the next timer call to
> perf_adjust_freq_unthr_context. I think we can also improve the code
> by just disabling a PMU once, we can take advantage of the
> perf_event_pmu_context and disable that PMU, iterate its events and
> then re-enable the PMU - i.e. no need for an enable and disable per
> event. I'll put a patch together.
>
> Thanks,
> Ian

+Jim Mattson

I initially thought this idea was just an alternative, or a more
professional fix in the perf subsystem. I was wrong...

This would be way better than just skipping frequency events in the
loop. Since if we just skip by event, we may still suffer from huge
overhead if the event list contains many sampling events in frequency
mode. Unfortunately, that is the general case when we do perf record
-e 'eventlist' (IIUC all events in eventlist are in frequency mode if
we don't specify period=). So the problem actually remains whenever we
do perf sampling unless we use something like Intel vtune.

On the other hand, since all of the events are presumably CPU core
events, with the fix we pay only once for the PMU reset per hrtimer
regardless of how many events are in frequency mode.

Looking forward to the patch! Please keep us posted if possible.

Thanks.
-Mingwei