RE: [PATCH 01/10] perf record: new interfaces to read ring buffer to file

From: Liang, Kan
Date: Wed Oct 11 2017 - 11:16:08 EST


> Em Wed, Oct 11, 2017 at 04:12:42AM +0000, Liang, Kan escreveu:
> > > > /* When check_messup is true, 'end' must points to a good entry */
> > > > static union perf_event * perf_mmap__read(struct perf_mmap *md,
> bool
> > > > check_messup, u64 start, diff --git a/tools/perf/util/evlist.h
> > > > b/tools/perf/util/evlist.h index b1c14f1..1ce4857 100644
> > > > --- a/tools/perf/util/evlist.h
> > > > +++ b/tools/perf/util/evlist.h
> > > > @@ -39,6 +39,16 @@ struct perf_mmap {
> > > > char event_copy[PERF_SAMPLE_MAX_SIZE] __aligned(8);
> > > > };
> > > >
> > > > +struct perf_mmap_read {
> > > > + struct perf_mmap *md;
> > > > + u64 head;
> > > > + u64 start;
> > > > + u64 end;
> > >
> > > So there will be always a one-on-one association of 'struct
> perf_mmap_read'
> > > and 'struct perf_mmap', why not go on adding more fields to 'struct
> > > perf_mmap' as we need
> >
> > The fields in 'struct perf_mmap' needs to be recalculated before each
> reading.
> > So I put them in a new struct.
>
> Ok, but I still think that if there is a one on one relatioship of
> perf_mmap_read with perf_mmap, then we should just extend the one we
> already have for per-mmap operations, i.e. 'struct perf_mmap', I'll try
> and provide a patch on top of my perf/core branch to see how it looks.
>
> > > but not doing it all at once (backward, snapshotting,
> > > overwrite, etc) but first the simple part, make the most basic mode:
> > >
> > > perf record -a
> > >
> > > perf top
> > >
> > > work, multithreaded, leaving the other more complicated modes
> fallbacking
> > > to the old format, then when we have it solid, go on getting the other
> > > features.
> >
> > Agree.
> > When I did perf top optimization, I also tried Namhyung's perf top multi-
> thread patch.
> > https://lwn.net/Articles/667469/
> > I think it may be a good start point.
>
> I have to read that to understand why we need those indexes :-\
>
> > I didn't work on his patch. Because the root cause of bad perf top
> performance
> > is non overwrite mode, which generate lots of samples shortly. It exceeds
> KNL's
> > computational capability. Multi-threading doesn't help much on this case.
> > So I tried to use overwrite mode then.
>
> Right, work on the problem you have at hand, but all these efforts
> should be considered to move forward.
>
> > > In the end, having the two formats supported will be needed anyway,
> and
> > > we can as well ask for processing with both perf.data file formats to
> compare
> > > results, while we strenghten out the new code.
> > >
> > > I just think we should do this in a more fine grained way to avoid too
> much
> > > code churn as well as having a fallback to the old code, that albeit non
> > > scalable, is what we have been using and can help in certifying that the
> new
> > > one works well, by comparing its outputs.
> >
> > I already extended the multithreading support for event synthesization in
> perf
> > record.
> > https://github.com/kliang2/perf.git perf_record_opt
> > I will send it out for review shortly after rebasing on the latest perf/core.
> >
> > In the patch series, I realloc buffer for each thread to temporarily keep the
> > processing result, and write them to the perf.data at the end of event
> > synthesization. The number of synthesized event is not big (hundreds of
> > Kilobyte). So I think it should be OK to do that.
>
> Ok, one thing I noticed was that with the snapshotting code we
> synthesize events multiple times, once per each new perf.data file, I
> haven't tested that with the multithreaded synthesizing code we recently
> merged, have you?

Not yet. I will do the test for perf record.

Thanks,
Kan