Re: [PATCH v2] KVM: x86/intr: Explicitly check NMI from guest to eliminate false positives

From: Like Xu
Date: Sun Feb 18 2024 - 04:52:01 EST


On 7/2/2024 5:08 am, Sean Christopherson wrote:
On Tue, Feb 06, 2024, Sean Christopherson wrote:
+Oliver

On Wed, Dec 06, 2023, Like Xu wrote:
Note that when vm-exit is indeed triggered by PMI and before HANDLING_NMI
is cleared, it's also still possible that another PMI is generated on host.
Also for perf/core timer mode, the false positives are still possible since
that non-NMI sources of interrupts are not always being used by perf/core.
In both cases above, perf/core should correctly distinguish between real
RIP sources or even need to generate two samples, belonging to host and
guest separately, but that's perf/core's story for interested warriors.

Oliver has a patch[*] that he promised he would send "soon" (wink wink) to
properly fix events that are configured to exclude the guest. Unless someone
objects, I'm going to tweak the last part of the changelog to be:

Note that when VM-exit is indeed triggered by PMI and before HANDLING_NMI
is cleared, it's also still possible that another PMI is generated on host.
Also for perf/core timer mode, the false positives are still possible since
that non-NMI sources of interrupts are not always being used by perf/core.
For events that are host-only, perf/core can and should eliminate false
positives by checking event->attr.exclude_guest, i.e. events that are
configured to exclude KVM guests should never fire in the guest.
Events that are configured to count host and guest are trickier, perhaps
impossible to handle with 100% accuracy? And regardless of what accuracy
is provided by perf/core, improving KVM's accuracy is cheap and easy, with
no real downsides.

Never mind, this causes KUT's pmu_pebs test to fail:

FAIL: Multiple (0x700000055): No OVF irq, none PEBS records.
FAIL: Adaptive (0x1): Multiple (0x700000055): No OVF irq, none PEBS records.
FAIL: Adaptive (0x2): Multiple (0x700000055): No OVF irq, none PEBS records.
FAIL: Adaptive (0x4): Multiple (0x700000055): No OVF irq, none PEBS records.
FAIL: Adaptive (0x1f000008): Multiple (0x700000055): No OVF irq, none PEBS records.
FAIL: GP counter 0 (0xfffffffffffe): No OVF irq, none PEBS records.
FAIL: Multiple (0x700000055): No OVF irq, none PEBS records.
FAIL: Adaptive (0x1): GP counter 0 (0xfffffffffffe): No OVF irq, none PEBS records.
FAIL: Adaptive (0x1): Multiple (0x700000055): No OVF irq, none PEBS records.
FAIL: Adaptive (0x2): GP counter 0 (0xfffffffffffe): No OVF irq, none PEBS records.
FAIL: Adaptive (0x2): Multiple (0x700000055): No OVF irq, none PEBS records.
FAIL: Adaptive (0x4): GP counter 0 (0xfffffffffffe): No OVF irq, none PEBS records.
FAIL: Adaptive (0x4): Multiple (0x700000055): No OVF irq, none PEBS records.
FAIL: Adaptive (0x1f000008): GP counter 0 (0xfffffffffffe): No OVF irq, none PEBS records.
FAIL: Adaptive (0x1f000008): Multiple (0x700000055): No OVF irq, none PEBS records.
FAIL: Multiple (0x700000055): No OVF irq, none PEBS records.
FAIL: Adaptive (0x1): Multiple (0x700000055): No OVF irq, none PEBS records.
FAIL: Adaptive (0x2): Multiple (0x700000055): No OVF irq, none PEBS records.
FAIL: Adaptive (0x4): Multiple (0x700000055): No OVF irq, none PEBS records.
FAIL: Adaptive (0x1f000008): Multiple (0x700000055): No OVF irq, none PEBS records.

It might be a test bug, but I have neither the time nor the inclination to
investigate.

For PEBS ovf case, we have "in_nmi() = 0x100000" from the core kernel and
the following diff fixes the issue:

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 995760ba072f..dcf665251fce 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1891,7 +1891,7 @@ enum kvm_intr_type {
/* Enable perf NMI and timer modes to work, and minimise false positives. */
#define kvm_arch_pmi_in_guest(vcpu) \
((vcpu) && (vcpu)->arch.handling_intr_from_guest && \
- (in_nmi() == ((vcpu)->arch.handling_intr_from_guest == KVM_HANDLING_NMI)))
+ (!!in_nmi() == ((vcpu)->arch.handling_intr_from_guest == KVM_HANDLING_NMI)))

void __init kvm_mmu_x86_module_init(void);
int kvm_mmu_vendor_module_init(void);

, does it help (tests passed on ICX) ?



Like,

If you want any chance of your patches going anywhere but my trash folder, you
need to change your upstream workflow to actually run tests. I would give most
people the benefit of the doubt, e.g. assume they didn't have the requisite
hardware, or didn't realize which tests would be relevant/important. But this
is a recurring problem, and you have been warned, multiple times.

Sorry, my CI resources are diverted to other downstream projects.
But there's no doubt it's my fault and this behavior will be corrected.