Re: [BUG] Guest OSes die simultaneously (bisected)

From: Paul E. McKenney
Date: Wed Jan 03 2024 - 20:01:27 EST


On Wed, Jan 03, 2024 at 04:24:06PM -0800, Sean Christopherson wrote:
> On Wed, Jan 03, 2024, Paul E. McKenney wrote:
> > On Wed, Jan 03, 2024 at 02:22:23PM -0800, Paul E. McKenney wrote:
> > > Hello!
> > >
> > > Since some time between v5.19 and v6.4, long-running rcutorture tests
> > > would (rarely but intolerably often) have all guests on a given host die
> > > simultaneously with something like an instruction fault or a segmentation
> > > violation.
> > >
> > > Each bisection step required 20 hosts running 10 hours each, and
> > > this eventually fingered commit c59a1f106f5c ("KVM: x86/pmu: Add
> > > IA32_PEBS_ENABLE MSR emulation for extended PEBS"). Although this commit
> > > is certainly messing with things that could possibly cause all manner
> > > of mischief, I don't immediately see a smoking gun. Except that the
> > > commit prior to this one is rock solid.
> > > Just to make things a bit more exciting, bisection in mainline proved
> > > to be problematic due to bugs of various kinds that hid this one. I was
> > > therefore forced to bisect among the commits backported to the internal
> > > v5.19-based kernel, which fingered the backported version of the patch
> > > called out above.
> >
> > Ah, and so why do I believe that this is a problem in mainline rather
> > than just (say) a backporting mistake?
> >
> > Because this issue was first located in v6.4, which already has this
> > commit included.
> >
> > Thanx, Paul
> >
> > > Please note that this is not (yet) an emergency. I will just continue
> > > to run rcutorture on v5.19-based hypervisors in the meantime.
> > >
> > > Any suggestions for debugging or fixing?
>
> This looks suspect:
>
> + u64 pebs_mask = cpuc->pebs_enabled & x86_pmu.pebs_capable;
> + int global_ctrl, pebs_enable;
>
> - arr[0].msr = MSR_CORE_PERF_GLOBAL_CTRL;
> - arr[0].host = intel_ctrl & ~cpuc->intel_ctrl_guest_mask;
> - arr[0].guest = intel_ctrl & ~cpuc->intel_ctrl_host_mask;
> - arr[0].guest &= ~(cpuc->pebs_enabled & x86_pmu.pebs_capable);
> - *nr = 1;
> + *nr = 0;
> + global_ctrl = (*nr)++;
> + arr[global_ctrl] = (struct perf_guest_switch_msr){
> + .msr = MSR_CORE_PERF_GLOBAL_CTRL,
> + .host = intel_ctrl & ~cpuc->intel_ctrl_guest_mask,
> + .guest = intel_ctrl & (~cpuc->intel_ctrl_host_mask | ~pebs_mask),
> + };
>
>
> IIUC (always a big if with this code), the intent is that the guest's version of
> PERF_GLOBAL_CTRL gets bits that are (a) not exclusive to the host and (b) not
> being used for PEBS. (b) is necessary because PEBS generates records in memory
> using virtual addresses, i.e. the CPU will write to memory using a virtual address
> that is valid for the host but not the guest. And so PMU counters that are
> configured to generate PEBS records need to be disabled while running the guest.
>
> Before that commit, the logic was:
>
> guest[PERF_GLOBAL_CTRL] = ctrl & ~host;
> guest[PERF_GLOBAL_CTRL] &= ~pebs;
>
> But after, it's now:
>
> guest[PERF_GLOBAL_CTRL] = ctrl & (~host | ~pebs);
>
> I.e. the kernel is enabled counters in the guest that are not host-only OR not
> PEBS. E.g. if only counter 0 is in use, it's using PEBS, but it's not exclusive
> to the host, then the new code will yield (truncated to a single byte for sanity)
>
> 1 = 1 & (0xf | 0xe)
>
> and thus keep counter 0 enabled, whereas the old code would yield
>
> 1 = 1 & 0xf
> 0 = 1 & 0xe
>
> A bit of a shot in the dark and completed untested, but I think this is the correct
> fix?

I am firing off some tests, and either way, thank you very much!!!

Thanx, Paul

> diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
> index a08f794a0e79..92d5a3464cb2 100644
> --- a/arch/x86/events/intel/core.c
> +++ b/arch/x86/events/intel/core.c
> @@ -4056,7 +4056,7 @@ static struct perf_guest_switch_msr *intel_guest_get_msrs(int *nr, void *data)
> arr[global_ctrl] = (struct perf_guest_switch_msr){
> .msr = MSR_CORE_PERF_GLOBAL_CTRL,
> .host = intel_ctrl & ~cpuc->intel_ctrl_guest_mask,
> - .guest = intel_ctrl & (~cpuc->intel_ctrl_host_mask | ~pebs_mask),
> + .guest = intel_ctrl & ~(cpuc->intel_ctrl_host_mask | pebs_mask),
> };
>
> if (!x86_pmu.pebs)
>