Re: [PATCH RFC 1/1] KVM: x86: add param to update master clock periodically

From: Sean Christopherson
Date: Mon Oct 02 2023 - 12:37:15 EST


On Mon, Oct 02, 2023, David Woodhouse wrote:
> On Fri, 2023-09-29 at 13:15 -0700, Dongli Zhang wrote:
> >
> >
> > We want more frequent KVM_REQ_MASTERCLOCK_UPDATE.
> >
> > This is because:
> >
> > 1. The vcpu->hv_clock (kvmclock) is based on its own mult/shift/equation.
> >
> > 2. The raw monotonic (tsc_clocksource) uses different mult/shift/equation.
> >
> > 3. As a result, given the same rdtsc, kvmclock and raw monotonic may return
> > different results (this is expected because they have different
> > mult/shift/equation).
> >
> > 4. However, the base in  kvmclock calculation (tsc_timestamp and system_time)
> > are derived from raw monotonic clock (master clock)
>
> That just seems wrong. I don't mean that you're incorrect; it seems
> *morally* wrong.
>
> In a system with X86_FEATURE_CONSTANT_TSC, why would KVM choose to use
> a *different* mult/shift/equation (your #1) to convert TSC ticks to
> nanoseconds than the host CLOCK_MONOTONIC_RAW does (your #2).
>
> I understand that KVM can't track the host's CLOCK_MONOTONIC, as it's
> adjusted by NTP. But CLOCK_MONOTONIC_RAW is supposed to be consistent.
>
> Fix that, and the whole problem goes away, doesn't it?
>
> What am I missing here, that means we can't do that?

I believe the answer is that "struct pvclock_vcpu_time_info" and its math are
ABI between KVM and KVM guests.

Like many of the older bits of KVM, my guess is that KVM's behavior is the product
of making things kinda sorta work with old hardware, i.e. was probably the least
awful solution in the days before constant TSCs, but is completely nonsensical on
modern hardware.

> Alternatively... with X86_FEATURE_CONSTANT_TSC, why do the sync at all?
> If KVM wants to decide that the TSC runs at a different frequency to
> the frequency that the host uses for CLOCK_MONOTONIC_RAW, why can't KVM
> just *stick* to that?

Yeah, bouncing around guest time when the TSC is constant seems counterproductive.

However, why does any of this matter if the host has a constant TSC? If that's
the case, a sane setup will expose a constant TSC to the guest and the guest will
use the TSC instead of kvmclock for the guest clocksource.

Dongli, is this for long-lived "legacy" guests that were created on hosts without
a constant TSC? If not, then why is kvmclock being used? Or heaven forbid, are
you running on hardware without a constant TSC? :-)

Not saying we shouldn't sanitize the kvmclock behavior, but knowing the exact
problematic configuration(s) will help us make a better decision on how to fix
the mess.