Re: [BUG] Guest OSes die simultaneously (bisected)

From: Sean Christopherson
Date: Wed Jan 03 2024 - 19:24:26 EST


On Wed, Jan 03, 2024, Paul E. McKenney wrote:
> On Wed, Jan 03, 2024 at 02:22:23PM -0800, Paul E. McKenney wrote:
> > Hello!
> >
> > Since some time between v5.19 and v6.4, long-running rcutorture tests
> > would (rarely but intolerably often) have all guests on a given host die
> > simultaneously with something like an instruction fault or a segmentation
> > violation.
> >
> > Each bisection step required 20 hosts running 10 hours each, and
> > this eventually fingered commit c59a1f106f5c ("KVM: x86/pmu: Add
> > IA32_PEBS_ENABLE MSR emulation for extended PEBS"). Although this commit
> > is certainly messing with things that could possibly cause all manner
> > of mischief, I don't immediately see a smoking gun. Except that the
> > commit prior to this one is rock solid.
> > Just to make things a bit more exciting, bisection in mainline proved
> > to be problematic due to bugs of various kinds that hid this one. I was
> > therefore forced to bisect among the commits backported to the internal
> > v5.19-based kernel, which fingered the backported version of the patch
> > called out above.
>
> Ah, and so why do I believe that this is a problem in mainline rather
> than just (say) a backporting mistake?
>
> Because this issue was first located in v6.4, which already has this
> commit included.
>
> Thanx, Paul
>
> > Please note that this is not (yet) an emergency. I will just continue
> > to run rcutorture on v5.19-based hypervisors in the meantime.
> >
> > Any suggestions for debugging or fixing?

This looks suspect:

+ u64 pebs_mask = cpuc->pebs_enabled & x86_pmu.pebs_capable;
+ int global_ctrl, pebs_enable;

- arr[0].msr = MSR_CORE_PERF_GLOBAL_CTRL;
- arr[0].host = intel_ctrl & ~cpuc->intel_ctrl_guest_mask;
- arr[0].guest = intel_ctrl & ~cpuc->intel_ctrl_host_mask;
- arr[0].guest &= ~(cpuc->pebs_enabled & x86_pmu.pebs_capable);
- *nr = 1;
+ *nr = 0;
+ global_ctrl = (*nr)++;
+ arr[global_ctrl] = (struct perf_guest_switch_msr){
+ .msr = MSR_CORE_PERF_GLOBAL_CTRL,
+ .host = intel_ctrl & ~cpuc->intel_ctrl_guest_mask,
+ .guest = intel_ctrl & (~cpuc->intel_ctrl_host_mask | ~pebs_mask),
+ };


IIUC (always a big if with this code), the intent is that the guest's version of
PERF_GLOBAL_CTRL gets bits that are (a) not exclusive to the host and (b) not
being used for PEBS. (b) is necessary because PEBS generates records in memory
using virtual addresses, i.e. the CPU will write to memory using a virtual address
that is valid for the host but not the guest. And so PMU counters that are
configured to generate PEBS records need to be disabled while running the guest.

Before that commit, the logic was:

guest[PERF_GLOBAL_CTRL] = ctrl & ~host;
guest[PERF_GLOBAL_CTRL] &= ~pebs;

But after, it's now:

guest[PERF_GLOBAL_CTRL] = ctrl & (~host | ~pebs);

I.e. the kernel is enabled counters in the guest that are not host-only OR not
PEBS. E.g. if only counter 0 is in use, it's using PEBS, but it's not exclusive
to the host, then the new code will yield (truncated to a single byte for sanity)

1 = 1 & (0xf | 0xe)

and thus keep counter 0 enabled, whereas the old code would yield

1 = 1 & 0xf
0 = 1 & 0xe

A bit of a shot in the dark and completed untested, but I think this is the correct
fix?

diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index a08f794a0e79..92d5a3464cb2 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -4056,7 +4056,7 @@ static struct perf_guest_switch_msr *intel_guest_get_msrs(int *nr, void *data)
arr[global_ctrl] = (struct perf_guest_switch_msr){
.msr = MSR_CORE_PERF_GLOBAL_CTRL,
.host = intel_ctrl & ~cpuc->intel_ctrl_guest_mask,
- .guest = intel_ctrl & (~cpuc->intel_ctrl_host_mask | ~pebs_mask),
+ .guest = intel_ctrl & ~(cpuc->intel_ctrl_host_mask | pebs_mask),
};

if (!x86_pmu.pebs)