Problems in ALPHA Timekeeping

bde@accessone.com
Wed, 2 Dec 1998 15:59:26 -0800


I tried using xntpd to sync my UDB's time to another machine, and in the
process, found two problems in arch/alpha/kernel/time.c.

First, the microsecond part returned by gettimeofday is wrong. The code
at time.c:369 is off by a factor of two:

/*
* usec = cycles * ticks_per_cycle * 2**48 * 1e6 / (2**48 * ticks)
* = cycles * (s_t_p_c) * 1e6 / (2**48 * ticks)
* = cycles * (s_t_p_c) * 15625 / (2**42 * ticks)
*
* which, given a 600MHz cycle and a 1024Hz tick, has a
* dynamic range of about 1.7e17, which is less than the
* 1.8e19 in an unsigned long, so we are safe from overflow.
*
* Round, but with .5 up always, since .5 to even is harder
* with no clear gain.
*/

delta_usec = delta_cycles * state.scaled_ticks_per_cycle * 15625;
----> delta_usec = ((delta_usec / ((1UL << (FIX_SHIFT-6)) * HZ)) + 1) / 2;

usec += delta_usec;

There is no allowance for the final divide-by-two, so the increment is
half what it should be. The attached program can demonstrate this by
examining the "jump" values. These are separated by about 977
microseconds, which is the tick period. Note that even with this fixed
there are small jumps, due to the HZ-related overhead. Note also that
using gettimeofday() for short loop timing makes the machine appear to
be twice as fast as it really is.

Second, timekeeping is erratic under heavy I/O load. This is easy to
see by letting xntpd synchronize the clock, checking the offset with
ntpdate -q host, running a partial kernel build, and finally rechecking
the offset with ntpdate.

The problem is in the conversion of elapsed machine cycles to ticks.
The cycle count is multiplied to scaled (by 2^48) ticks which is then
rounded. The assumption appears to be that:

Round(Sum(values...) = Sum(Round(values)...)

If the average values are around one, this can be grossly in error.
It is easy to construct an example where the tick count will be low by
one-third.

The patch below retains the 2^48 scaling method to avoid other impacts,
but a better scheme would probably use the cycle count itself. (Line
numbers may vary somewhat, due to an unrelated local patch.)

--- linux.rels/arch/alpha/kernel/time.c Tue Oct 20 10:43:21 1998
+++ linux/arch/alpha/kernel/time.c Sat Nov 28 15:00:15 1998
@@ -60,6 +63,8 @@
unsigned long scaled_ticks_per_cycle;
/* last time the CMOS clock got updated */
time_t last_rtc_update;
+ /* partial unused tick */
+ unsigned long partial_tick;
} state;

unsigned long est_cycle_freq;
@@ -79,8 +84,6 @@
*/
void timer_interrupt(int irq, void *dev, struct pt_regs * regs)
{
- const unsigned long half = 1UL << (FIX_SHIFT - 1);
- const unsigned long mask = (1UL << (FIX_SHIFT + 1)) - 1;
unsigned long delta;
__u32 now;
long nticks;
@@ -96,22 +99,21 @@
#endif

/*
- * Estimate how many ticks have passed since the last update.
- * Round the result, .5 to even. When we loose ticks due to
- * say using IDE, the clock has been seen to run up to 15% slow
- * if we truncate.
+ * Calculate how many ticks have passed since the last update,
+ * including any previous partial leftover. Save any resulting
+ * fraction for the next pass.
*/
now = rpcc();
delta = now - state.last_time;
state.last_time = now;
- delta = delta * state.scaled_ticks_per_cycle;
- if ((delta & mask) != half)
- delta += half;
+ delta = delta * state.scaled_ticks_per_cycle + state.partial_tick;
+ state.partial_tick = delta & ((1UL << FIX_SHIFT) - 1);
nticks = delta >> FIX_SHIFT;

- do {
+ while (nticks > 0) {
do_timer(regs);
- } while (--nticks > 0);
+ nticks--;
+ }

/*
* If we have an externally synchronized Linux clock, then update
@@ -325,6 +356,7 @@
state.scaled_ticks_per_cycle
= ((unsigned long) HZ << FIX_SHIFT) / cycle_freq;
state.last_rtc_update = 0;
+ state.partial_tick = 0L;

/* setup timer */
irq_handler = timer_interrupt;
@@ -366,7 +398,7 @@
*/

delta_usec = delta_cycles * state.scaled_ticks_per_cycle * 15625;
- delta_usec = ((delta_usec / ((1UL << (FIX_SHIFT-6)) * HZ)) + 1) / 2;
+ delta_usec = ((delta_usec / ((1UL << (FIX_SHIFT-6-1)) * HZ)) + 1) / 2;

usec += delta_usec;
if (usec >= 1000000) {

The attached program (gett.c):

#include <sys/time.h>
#include <stdio.h>
#include <malloc.h>

main(int argc, char **argv)
{
struct timeval tmv;
struct timeval *tvb;
int ct;
int i;
int usec;
int prusec;

if (argc != 2) {
usage();
}
if ((ct = strtol(argv[1], (char **)0, 10)) <= 0) {
usage();
}
if ((tvb = (struct timeval *)malloc(ct * sizeof(struct timeval))) ==
NULL) {
perror("timeval array");
exit(1);
}
for (i = 0; i < ct; i++) {
if (gettimeofday(&tmv, NULL) < 0) {
perror("?");
exit(1);
}
tvb[i] = tmv;
}
prusec = 0;
for (i = 0; i < ct; i++) {
usec = tvb[i].tv_sec * 1000000 + tvb[i].tv_usec;
printf("%10d.%06d%6d\n", tvb[i].tv_sec, tvb[i].tv_usec,
(prusec ? usec - prusec : 0));
prusec = usec;
}
}

usage()
{
fprintf(stderr, "usage: gett count\n");
exit(1);
}

Invoke this as "gett 2000 > outfile", and examine the result.

-- 
B. D. Elliott   bde@accessone.com   (Seattle)

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/