[PATCH] perf/x86/amd: Do not WARN on every IRQ

From: Breno Leitao
Date: Fri Jun 16 2023 - 07:53:26 EST


On some systems, the Performance Counter Global Status Register is
coming with reserved bits set, which causes the system to be unusable
if a simple `perf top` runs. The system hits the WARN() thousands times
while perf runs.

WARNING: CPU: 18 PID: 20608 at arch/x86/events/amd/core.c:944 amd_pmu_v2_handle_irq+0x1be/0x2b0

This happens because the "Performance Counter Global Status Register"
(PerfCntGlobalStatus) MSR has bit 7 set. Bit 7 should be reserved according
to the documentation (Figure 13-12 from "AMD64 Architecture Programmer’s
Manual, Volume 2: System Programming, 24593"[1]

WARN_ONCE if any reserved bit is set, and sanitize the value to what the
code is handling, so the overflow events continue to be handled for the
number of events that are known to be sane.

Signed-off-by: Breno Leitao <leitao@xxxxxxxxxx>
Fixes: 7685665c390d ("perf/x86/amd/core: Add PerfMonV2 overflow handling")

[1] Link: https://www.amd.com/system/files/TechDocs/24593.pdf
---
arch/x86/events/amd/core.c | 4 ++++
1 file changed, 4 insertions(+)

diff --git a/arch/x86/events/amd/core.c b/arch/x86/events/amd/core.c
index bccea57dee81..809ddb15c122 100644
--- a/arch/x86/events/amd/core.c
+++ b/arch/x86/events/amd/core.c
@@ -909,6 +909,10 @@ static int amd_pmu_v2_handle_irq(struct pt_regs *regs)
status &= ~GLOBAL_STATUS_LBRS_FROZEN;
}

+ amd_pmu_global_cntr_mask = (1ULL << x86_pmu.num_counters) - 1;
+ WARN_ON_ONCE(status & ~amd_pmu_global_cntr_mask);
+ status &= amd_pmu_global_cntr_mask;
+
for (idx = 0; idx < x86_pmu.num_counters; idx++) {
if (!test_bit(idx, cpuc->active_mask))
continue;
--
2.34.1