RE: [Xen-devel] [PATCH] xen: always set the sched clock as unstable

From: Dan Magenheimer
Date: Mon Apr 16 2012 - 12:06:06 EST

> From: David Vrabel [mailto:david.vrabel@xxxxxxxxxx]
> Subject: Re: [Xen-devel] [PATCH] xen: always set the sched clock as unstable

Nacked-by: Dan Magenheimer <dan.magenheimer@xxxxxxxxxx>

(Apologies for missing the original post... our Oracle mail server
has gone bonkers again... classifying nearly all (but not all) xen-devel
email as spam. This problem started when moved to a different
ISP last year, was supposedly fixed by Oracle IT, and has just
started being a problem again. Argh!)

> On 16/04/12 12:32, Jan Beulich wrote:
> >>>> On 13.04.12 at 20:20, David Vrabel <david.vrabel@xxxxxxxxxx> wrote:
> >> From: David Vrabel <david.vrabel@xxxxxxxxxx>
> >>
> >> The sched clock was considered stable based on the capabilities of the
> >> underlying hardware. This does not make sense for Xen PV guests as:
> >> a) the hardware TSC is not used directly as the clock source; and b)
> >> guests may migrate to hosts with different hardware capabilities.
> >>
> >> It is not clear to me whether the Xen clock source is supposed to be
> >> stable and whether it should be stable across migration. For a clock
> >> source to be stable it must be: a) monotonic; c) synchronized across
> >> CPUs; and c) constant rate.
> Tim, Thomas, can you comment on the above paragraph? Is it correct?

(Sigh... I keep seeing clock-related things, wish I had more time
to spend on them, cursing, and going back to other things. But,
I need to comment further here...)

Hmmm... I spent a great deal of time on TSC support in the hypervisor
2-3 years ago. I worked primarily on PV, but Intel supposedly was tracking
everything on HVM as well. There's most likely a bug or two still lurking
but, for all guests, with the default tsc_mode, TSC is provided by Xen
as an absolutely stable clock source. If Xen determines that the underlying
hardware declares that TSC is stable, guest rdtsc instructions are not trapped.
If it is not, Xen emulates all guest rdtsc instructions. After a migration or
save/restore, TSC is always emulated. The result is (ignoring possible
bugs) that TSC as provided by Xen is a) monotonic; b) synchronized across
CPUs; and c) constant rate. Even across migration/save/restore.

This should be true for Xen 4.0+ (but not for pre-Xen-4.0).

Please see docs/misc/tscmode.txt in the xen tree. Though
it may appear at first to be targeted at a different audience,
all the relevant info is in there if you read it all the way through.

(If you have any questions or disagreements on that doc, please start
a new thread and cc me directly since my list access is unreliable.)

> >> There have also been reports of systems with apparently unstable
> >> clocks where clearing sched_clock_stable has fixed problems with
> >> migrated VMs hanging.
> >>
> >> So, always set the sched clock as unstable when using the Xen clock
> >> source.
> >>
> >> Signed-off-by: David Vrabel <david.vrabel@xxxxxxxxxx>
> >> ---
> >> arch/x86/xen/time.c | 1 +
> >> 1 files changed, 1 insertions(+), 0 deletions(-)
> >>
> >> diff --git a/arch/x86/xen/time.c b/arch/x86/xen/time.c
> >> index 0296a95..8469b5a 100644
> >> --- a/arch/x86/xen/time.c
> >> +++ b/arch/x86/xen/time.c
> >> @@ -473,6 +473,7 @@ static void __init xen_time_init(void)
> >> do_settimeofday(&tp);
> >>
> >> setup_force_cpu_cap(X86_FEATURE_TSC);
> >> + sched_clock_stable = 0;
> >
> > This, unfortunately, is not sufficient afaict: If a CPU gets brought up
> > post-boot, the variable may need to be cleared again. Instead you
> > ought to call mark_tsc_unstable().
> Yeah, mark_tsc_unstable() is the right thing to do.


No, no, no. The exact opposite is true. Like VMware, TSC is
stable. The issue is that Linux trusts other clock hardware more
completely than TSC so whenever there is a problem with another
clocksource, Linux blames TSC and marks TSC unstable. But TSC
on Xen 4.0+ is innocent. In fact, TSC is a better clocksource
choice than clocksource=xen (aka pvclock) because pvclock
indirectly depends on TSC.

For upstream kernels, the answer is to set clocksource=tsc
and tsc=reliable, like VMware enforces. See:

In fact, it might be wise for a Xen-savvy kernel to check to see
if it is running on Xen-4.0+ and, if so, force clocksource=tsc
and tsc=reliable.

There have been very odd rare problems reported in Xen time
handling for a very long time. These usually manifest as some
kind of "TSC is not stable" message from a guest Linux kernel,
but the symptoms always point away from TSC as the culprit.
Forcing Xen-savvy guests to use TSC will either make these problems
go away (if they haven't already been fixed) or allow us to find
the obscure underlying hypervisor bugs rather than paper over them.


P.S. For anyone new to this areas, see VMware's classic document:

P.P.S. note this recent kernel issue which is related, but
likely not seen in Xen... it pre-requires cpu overcommitment
at boot time when TSC is being calibrated by the kernel.
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at