Re: [PATCH RFC 1/1] KVM: x86: add param to update master clock periodically

From: David Woodhouse
Date: Mon Oct 02 2023 - 14:16:34 EST


On Mon, 2023-10-02 at 09:37 -0700, Sean Christopherson wrote:
> On Mon, Oct 02, 2023, David Woodhouse wrote:
> > On Fri, 2023-09-29 at 13:15 -0700, Dongli Zhang wrote:
> > >
> > > 1. The vcpu->hv_clock (kvmclock) is based on its own mult/shift/equation.
> > >
> > > 2. The raw monotonic (tsc_clocksource) uses different mult/shift/equation.
> > >
> >
> > That just seems wrong. I don't mean that you're incorrect; it seems
> > *morally* wrong.
> >
> > In a system with X86_FEATURE_CONSTANT_TSC, why would KVM choose to use
> > a *different* mult/shift/equation (your #1) to convert TSC ticks to
> > nanoseconds than the host CLOCK_MONOTONIC_RAW does (your #2).
> >
> > I understand that KVM can't track the host's CLOCK_MONOTONIC, as it's
> > adjusted by NTP. But CLOCK_MONOTONIC_RAW is supposed to be consistent.
> >
> > Fix that, and the whole problem goes away, doesn't it?
> >
> > What am I missing here, that means we can't do that?
>
> I believe the answer is that "struct pvclock_vcpu_time_info" and its math are
> ABI between KVM and KVM guests.
>
> Like many of the older bits of KVM, my guess is that KVM's behavior is the product
> of making things kinda sorta work with old hardware, i.e. was probably the least
> awful solution in the days before constant TSCs, but is completely nonsensical on
> modern hardware.

I still don't understand. The ABI and its math are fine. The ABI is just
"at time X the TSC was Y, and the TSC frequency is Z"

I understand why on older hardware, those values needed to *change*
occasionally when TSC stupidity happened.

But on newer hardware, surely we can set them precisely *once* when the
VM starts, and never ever have to change them again? Theoretically not
even when we pause the VM, kexec into a new kernel, and resume the VM!

But we *are* having to change it, because apparently
CLOCK_MONOTONIC_RAW is doing something *other* than incrementing at
precisely the frequency of the known and constant TSC.

But *why* is CLOCK_MONOTONIC_RAW doing that? I thought that the whole
point of CLOCK_MONOTONIC_RAW was to be consistent and not adjusted by
NTP etc.? Shouldn't it run at precisely the same rate as the kvmclock,
with no skew at all?

And if CLOCK_MONOTONIC_RAW is not what I thought it was... do we really
have to keep resetting the kvmclock to it at all? On modern hardware
can't the kvmclock be defined by the TSC alone?

Attachment: smime.p7s
Description: S/MIME cryptographic signature