Re: [RFC PATCH] perf: Provide status of known PMUs

From: Stephane Eranian
Date: Fri Jul 10 2015 - 15:00:04 EST


Hi,

On Fri, Jul 10, 2015 at 1:35 AM, Ingo Molnar <mingo@xxxxxxxxxx> wrote:
>
> * Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
>
>> On Thu, Jul 09, 2015 at 02:32:05PM +0200, Ingo Molnar wrote:
>> >
>> > perf record error: The 'bts' PMU is not available, because the CPU does not support it
>>
>> This one makes sense.
>>
>> > perf record error: The 'bts' PMU is not available, because this architecture does not support it
>> > perf record error: The 'bts' PMU is not available, because its driver is not built into the kernel
>> >
>> > Because if it's the wrong architecture or CPU, I look for a box with the right
>> > one, if it's simply the kernel not having the necessary PMU driver then I'll boot
>> > a kernel with it enabled.
>>
>> These not so much; why won't a generic: "Unknown PMU, check arch/kernel" do?
>
> Yeah, I mean why not make the user's job harder if we can? We really don't want to
> solve this problem technically and we _really_ want tooling to be fundamentally
> unhelpful, right? ;-)
>
> I realize that the 'Error: there was a bug, aborting' style of sado-masochistic
> error messages are the current Linux tooling status quo, which opaque error
> feedback comes from an early technological mistake of Unix system calls screwing
> up error handling, and I also see that after decades of abuse people are showing
> signs of the Stockholm Syndrome related to this problem, but it _really_ does not
> have to be so ...
>
> Whenever we can we should change such bad patterns.
>
>> The thing is, I hate that hard-coded list, its pain I don't need.
>
> Absolutely! I pointed this out during review as well.
>
> It does not impact the core concept though: we should have a single numeric error,
> and free form error strings provided by the place that first triggers some
> problem. That should be both programmatically easy to handle and maximally
> informative to the users.
>
> At least half of a tool's usability comes not from how it behaves when it works,
> but how it behaves when it does not. (SystemD, I'm looking at you.)
>
This patch looks useful but it does not address a related issue. Here
you are reporting
on the status of specific PMU support, i.e., PMU is not supported by hardware.
But there is another problem which I ran into on ARM very often (like
on Tegra) and it
really annoys me. The PMU hardware is present, but the instance of
the PMU on a CPU
is not present, simply because the CPU is hotpluggable and its offline
at the time the
tool (perf) starts. I am not talking about explicit hotplugging by the
user but instead be
the kernel. Then during the run, the CPU is plugged back in by the
kernel to handle the
load. Perf misses monitoring that CPU completely, thus it does not
measure what's going
on in reality.

I understand that reporting that a PMU instance is supported but
offline does not
solve the entire problem. There needs to be some other kernel support.
But I think
it would be good to have the tool at least issue a warning saying:
"some CPUs are
offline, not monitoring all CPUs, results may be partial".
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/