Re: Basic perf PMU support for Haswell v11

From: Ingo Molnar
Date: Thu May 02 2013 - 04:39:14 EST



[ FYI, we are still in the merge window when maintainers are very busy, so
don't expect quick replies to mails that are not about merge window
related patches and commits. Those issues are typically handled after
-rc1 has been released, once most of the merge fallout in the upstream
kernel has been resolved. ]

* Andi Kleen <andi@xxxxxxxxxxxxxx> wrote:

> > How well was this
> > patch-set tested on non-Haswell hardware, which makes up 99.99% of our
> > installed base?
>
> I tested on a couple systems now and then: usually Haswell, IvyBridge,
> sometimes also Westmere and Atom. I don't retest every iteration,
> as you know most of the changes you're requesting don't affect
> the binary.
>
> My test bed is likely to be smaller than yours though and as usual
> as you well know some part of the kernel QA is after release.
>
> >
> > In particular, after applying your patches, 'perf top' stopped working on
> > an Intel testbox of mine:
> >
> > processor : 15
> > vendor_id : GenuineIntel
> > cpu family : 6
> > model : 26
> > model name : Intel(R) Xeon(R) CPU X55600 @ 2.80GHz
>
> I assume the second 0 is a typo?

Probably a typo in the BIOS.

> > stepping : 5
>
> > 'perf top' just does not produce any profiling output - it says 0 events.
>
> Thanks for testing.
>
> I found a similar system (not same stepping, but same model) and tested
> perf top works fine here. Also on a couple of other systems.
>
> Since I cannot reproduce I would need your help debugging it.
>
> I assume it worked before my patches.

Yes, obviously.

Here's another easy to test symptom of the bug:

$ perf record ./hackbench 10
Time: 0.097
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.043 MB perf.data (~1866 samples) ]

$ perf report --stdio
Error:
The perf.data file has no samples!

Expected result is a profile displayed by 'perf report'.

> [...] If you don't know please double check. Also I assume there's no
> general problem between the user land perf you used and the kernel.
>
> The only patch I could think of which may affect other systems
> is the moving of the APIC ack.

Btw., I warned you about the delicate placement of the APIC ACK in my
Haswell patches review feedback mail, months ago:

https://lkml.org/lkml/2013/2/13/78

which mail you never replied to and which warning you apparently ignored.

When modifying the PMU ack sequence, please find the relevant Intel SDM
that recommends a different ACK sequence from what is implemented
currently, and document this in the changelog.

I'm going to ignore your APIC ACK patch until you do it properly.

> So does it work if you revert
>
> perf, x86: Move NMI clearing to end of PMI handler after ...
>
> If that is it we could white list it for Haswell.

No, reverting that patch did not fix the bug.

I have bisected it down to this patch of yours:

"perf/x86: Add Haswell PMU support"

Most of that patch has no effect on non-Haswell machines, so the scope of
problematic changes should be pretty small.

My quick guess is that your patch broke fixed counters.

If you find the bug or want me to test anything please send a delta patch,
relative to your last series - as I have parts of your patches applied
already locally with cleanups, etc.

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/