Re: [PATCH RFC 1/1] KVM: x86: add param to update master clock periodically

From: Sean Christopherson
Date: Tue Oct 03 2023 - 20:07:28 EST


On Tue, Oct 03, 2023, David Woodhouse wrote:
>
>
> On 3 October 2023 01:53:11 BST, Sean Christopherson <seanjc@xxxxxxxxxx> wrote:
> >I think there is still use for synchronizing with the host's view of time, e.g.
> >to deal with lost time across host suspend+resume.
> >
> >So I don't think we can completely sever KVM's paravirt clocks from host time,
> >at least not without harming use cases that rely on the host's view to keep
> >accurate time. And honestly at that point, the right answer would be to stop
> >advertising paravirt clocks entirely.
> >
> >But I do think we can address the issues that Dongli and David are obversing
> >where guest time drifts even though the host kernel's base time hasn't changed.
> >If I've pieced everything together correctly, the drift can be eliminated simply
> >by using the paravirt clock algorithm when converting the delta from the raw TSC
> >to nanoseconds.
> >
> >This is *very* lightly tested, as in it compiles and doesn't explode, but that's
> >about all I've tested.
>
> Hm, I don't think I like this.

Yeah, I don't like it either. I'll respond to your other mail with details, but
this is a dead end anything.

> You're making get_monotonic_raw() not *actually* return the monotonic_raw
> clock, but basically return the kvmclock instead? And why? So that when KVM
> attempts to synchronize the kvmclock to the monotonic_raw clock, it gets
> tricked into actually synchronizing the kvmclock to *itself*?
>
> If you get this right, don't we have a fairly complex piece of code that has
> precisely *no* effect?
>
> Can't we just *refrain* from synchronizing the kvmclock to *anything*, in the
> CONSTANT_TSC case? Why do we do that anyway?
>
> (Suspend/resume, live update and live migration are different. In *those*
> cases we may need to preserve both the guest TSC and kvmclock based on either
> the host TSC or CLOCK_TAI. But that's different.)

The issue is that the timekeeping code doesn't provide a notification mechanism
to *just* get updates for things like suspend/reume. We could maybe do something
in KVM like unregister the notifier if the TSC is constant, and manually refresh
on suspend/resume. But that's pretty gross too, and I'd definitely be concerned
that we missed something.