Re: [PATCH 1/1] selftests: KVM: add test to print boottime wallclock

From: Sean Christopherson
Date: Fri Oct 20 2023 - 11:22:28 EST


On Fri, Oct 20, 2023, Dongli Zhang wrote:
> Hi Sean and Andrew,
>
> On 10/18/23 23:51, Andrew Jones wrote:
> > On Wed, Oct 18, 2023 at 12:51:55PM -0700, Sean Christopherson wrote:
> >> On Fri, Oct 06, 2023, Dongli Zhang wrote:
> >>> As inspired by the discussion in [1], the boottime wallclock may drift due
> >>> to the fact that the masterclock (or host monotonic clock) and kvmclock are
> >>> calculated based on the algorithms in different domains.
> >>>
> >>> This is to introduce a testcase to print the boottime wallclock
> >>> periodically to help diagnose the wallclock drift issue in the future.
> >>>
> >>> The idea is to wrmsr the MSR_KVM_WALL_CLOCK_NEW, and read the boottime
> >>> wallclock nanoseconds immediately.
> >>
> >> This doesn't actually test anything of interest though. IIUC, it requires a human
> >> looking at the output for it to provide any value. And it requires a manual
> >> cancelation, which makes it even less suitable for selftests.
> >>
> >> I like the idea, e.g. I bet there are more utilities that could be written that
> >> utilize the selftests infrastructure, just not sure what to do with this (assuming
> >> it can't be massaged into an actual test).
>
> Thank you very much for the suggestion.
>
> Would that work if I turn it into a test:
>
> 1. Capture boottime_wallclock_01.
> 2. Wait for 10-second by default (configurable, e.g., max 60-second)
> 3. Capture boottime_wallclock_02.
> 4. Report error if drift.

Rather than pick an arbitrary time of 10 seconds, deliberately introduce a
plausible bug in KVM (or re-introduce a previous bug) and see how low you can
push the wait time while still reliably detecting the unwanted drift. Then add
a reasonable buffer to give the test some margin for error. Given the drift that
David reported with the xen_shinfo test, I assume/hope that a 10 second runtime
would be overkill.

I would also differentiate between total runtime and the periodic check time,
e.g. to allow checking for drift every N (milli)seconds while having a total
runtime of M seconds. Then there's no need to set an upper bound, e.g. the user
could set the test to run in the background for multiple hours without having to
worry about the test being useless if it's canceled early.