RE: [PATCH V2 03/13] perf/x86: output sampling overhead

From: Liang, Kan
Date: Wed Dec 07 2016 - 14:03:50 EST




> On Tue, Dec 06, 2016 at 03:47:40PM +0000, Liang, Kan wrote:
>
> > > It doesn't record anything, it generates the output. And it doesn't
> > > explain why that needs to be in pmu::del(), in general that's a horrible
> thing to do.
> >
> > Yes, it only generate/log the output. Sorry for the confused wording.
> >
> > The NMI overhead is pmu specific overhead. So the NMI overhead output
> > should be generated in pmu code.
>
> True, but you're also accounting in a per-cpu bucket, which means it
> includes all events. At which point the per-event overhead thing doesn't
> really make sense.
>
> It also means that previous sessions influence the numbers of our current
> session; there's no explicit reset of the numbers.
>
> > I assume that the pmu:del is the last called pmu function when perf finish.
> > Is it a good place for logging?
>
> No, its horrible. Sure, we'll call pmu::del on events, but yuck.
>
> You really only want _one_ invocation when you stop using the event, and
> we don't really have a good place for that. But instead of creating one, you
> do horrible things.
>
> Now, I realize there's a bit of a catch-22 in that the moment we know the
> event is going away, its already gone from userspace. So we cannot dump
> data from there in general..
>
> Howver, if we have output redirection we can, but that would make things
> depend on that and it cannot be used for the last event who's buffer we're
> using.
>
> Another option would be to introduce PERF_EVENT_IOC_STAT or something
> like that, and have the tool call that when its 'done'.
>

OK. I think I will implement a new ioctl PERF_EVENT_IOC_STAT.

The IOC_STAT will be called by tool when its 'start' and 'done'.
I will also introduce two new ioc flags.
(PERF_IOC_FLAG_STAT_START and PERF_IOC_FLAG_STAT_DONE)
In 'start', the kernel will reset the numbers.
In 'done', the kernel will generate all outputs. The overhead numbers are from
different cpu. To distinguish them, we have to add cpu in overhead_entry. We
cannot trust sample_id.
struct perf_overhead_entry {
__u32 cpu;
__u32 nr;
__u64 time;
};

I will also add void (*overhead_stat) in struct pmu to do pmu specific reset and
generation.

In V2, the three overheads are stored in different per-event/per-cpu ctx.
For next V3, I will store all the overheads in pmu's cpuctx.
So the number will be the overhead for pmu, not the global system.
It should be more clear and useful.

how does it sound?

Thanks,
Kan