Re: [PATCH] KVM: x86: vPMU: truncate counter value to allowed width

From: Jim Mattson
Date: Fri Jun 30 2023 - 19:25:23 EST


On Fri, Jun 30, 2023 at 9:40 AM Jim Mattson <jmattson@xxxxxxxxxx> wrote:
>
> On Fri, Jun 30, 2023 at 8:21 AM Roman Kagan <rkagan@xxxxxxxxx> wrote:
> >
> > On Fri, Jun 30, 2023 at 07:28:29AM -0700, Sean Christopherson wrote:
> > > On Fri, Jun 30, 2023, Roman Kagan wrote:
> > > > On Thu, Jun 29, 2023 at 05:11:06PM -0700, Sean Christopherson wrote:
> > > > > @@ -74,6 +74,14 @@ static inline u64 pmc_read_counter(struct kvm_pmc *pmc)
> > > > > return counter & pmc_bitmask(pmc);
> > > > > }
> > > > >
> > > > > +static inline void pmc_write_counter(struct kvm_pmc *pmc, u64 val)
> > > > > +{
> > > > > + if (pmc->perf_event && !pmc->is_paused)
> > > > > + perf_event_set_count(pmc->perf_event, val);
> > > > > +
> > > > > + pmc->counter = val;
> > > >
> > > > Doesn't this still have the original problem of storing wider value than
> > > > allowed?
> > >
> > > Yes, this was just to fix the counter offset weirdness. My plan is to apply your
> > > patch on top. Sorry for not making that clear.
> >
> > Ah, got it, thanks!
> >
> > Also I'm now chasing a problem that we occasionally see
> >
> > [3939579.462832] Uhhuh. NMI received for unknown reason 30 on CPU 43.
> > [3939579.462836] Do you have a strange power saving mode enabled?
> > [3939579.462836] Dazed and confused, but trying to continue
> >
> > in the guests when perf is used. These messages disappear when
> > 9cd803d496e7 ("KVM: x86: Update vPMCs when retiring instructions") is
> > reverted. I haven't yet figured out where exactly the culprit is.
>
> Maybe this is because KVM doesn't virtualize
> IA32_DEBUGCTL.Freeze_PerfMon_On_PMI?

Never mind. Linux doesn't set IA32_DEBUGCTL.Freeze_PerfMon_On_PMI.

> Consider:
>
> 1. PMC0 overflows, GLOBAL_STATUS[0] is set, and an NMI is delivered.
> 2. Before the guest's PMI handler clears GLOBAL_CTRL, PMC1 overflows,
> GLOBAL_STATUS[1] is set, and an NMI is queued for delivery after the
> next IRET.
> 3. The guest's PMI handler clears GLOBAL_CTRL, reads 3 from
> GLOBAL_STATUS, writes 3 to GLOBAL_OVF_CTRL, re-enables GLOBAL_CTRL,
> and IRETs.
> 4. The queued NMI is delivered, but GLOBAL_STATUS is now 0. No one
> claims the NMI, so we get the spurious NMI message.
>
> I don't know why this would require counting the retirement of
> emulated instructions. It seems that hardware PMC overflow in the
> early part of the guest's PMI handler would also be a problem.
>
> > Thanks,
> > Roman.
> >
> >
> >
> > Amazon Development Center Germany GmbH
> > Krausenstr. 38
> > 10117 Berlin
> > Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
> > Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
> > Sitz: Berlin
> > Ust-ID: DE 289 237 879
> >
> >
> >