kernel clock for NTP

Ulrich Windl (windl@rkdvmhp1.dvm.klinik.uni-regensburg.de)
Wed, 21 Feb 1996 16:10:14 +0100


Hello!

still the current Linux kernel clock has the wrong functions to make
xntpd happy. ntp_gettime() can be emulated with adjtimex(), but the
ntp_adjtime() has much more arguments (as shown below).

As I know that there are at least some people using Linux for a time
server, I hope that maybe someone has patches for the current (as of
Dec '94) clock model. I can't believe that nobody did it within one
year.

Please excuse my lengthy quote, but maybe some don't know what this is
about, and please don't quote it again in full when replying.

Ulrich

(The following text is from xntp-3.5a/doc/README.kern("A Kernel Model
for Precision Timekeeping", Revised 14 October 1994, Revised 13
December 1994))
[...]
4. Programming Model and Interfaces

This section describes the programming model for the synchronization
daemon and user application programs. The ideas are based on suggestions
from Jeff Mogul and Philip Gladstone and a similar interface designed by
the latter. It is important to point out that the functionality of the
original Unix adjtime() system call is preserved, so that the modified
kernel will work as the unmodified one, should the new features not be
in use. In this case the ntp_adjtime() system call can still be used to
read and write kernel variables that might be used by a synchronization
daemon other than NTP, for example.

The kernel routines use the clock state variable time_state, which
records whether the clock is synchronized, waiting for a leap second,
etc. The value of this variable is returned as the result code by both
the ntp_gettime() and ntp_adjtime() system calls. It is set implicitly
by the STA_DEL and STA_INS status bits, as described previously. Values
presently defined in the timex.h header file are as follows:

TIME_OK 0 no leap second warning
TIME_INS 1 insert leap second warning
TIME_DEL 2 delete leap second warning
TIME_OOP 3 leap second in progress
TIME_WAIT 4 leap second has occurred
TIME_ERROR 5 clock not synchronized

In case of a negative result code, the kernel has intercepted an invalid
address or (in case of the ntp_adjtime() system call), a superuser
violation.

4.1. The ntp_gettime() System Call

The syntax and semantics of the ntp_gettime() call are given in the
following fragment of the timex.h header file. This file is identical,
except for the SHIFT_HZ define, in the SunOS, Ultrix, OSF/1 and HP-UX
kernel distributions. (The SHIFT_HZ define represents the logarithm to
the base 2 of the clock oscillator frequency specific to each system
type.) Note that the timex.h file calls the syscall.h system header
file, which must be modified to define the SYS_ntp_gettime system call
specific to each system type. The kernel distributions include
directions on how to do this.

/*
* This header file defines the Network Time Protocol (NTP)
* interfaces for user and daemon application programs. These are
* implemented using private system calls and data structures and
* require specific kernel support.
*
* NAME
* ntp_gettime - NTP user application interface
*
* SYNOPSIS
* #include <sys/timex.h>
*
* int system call(SYS_ntp_gettime, tptr)
*
* int SYS_ntp_gettime defined in syscall.h header file
* struct ntptimeval *tptr pointer to ntptimeval structure
*
* NTP user interface - used to read kernel clock values
* Note: maximum error = NTP synch distance = dispersion + delay /
* 2
* estimated error = NTP dispersion.
*/
struct ntptimeval {
struct timeval time; /* current time (ro) */
long maxerror; /* maximum error (us) (ro) */
long esterror; /* estimated error (us) (ro) */
};

The ntp_gettime() system call returns three read-only (ro) values in the
ntptimeval structure: the current time in unix timeval format plus the
maximum and estimated errors in microseconds. While the 32-bit long data
type limits the error quantities to something more than an hour, in
practice this is not significant, since the protocol itself will declare
an unsynchronized condition well below that limit. In the NTP Version 3
specification, if the protocol computes either of these values in excess
of 16 seconds, they are clamped to that value and the system clock
declared unsynchronized.

Following is a detailed description of the ntptimeval structure members.

struct timeval time (ro)

This member is the current system time expressed as a Unix timeval
structure. The timeval structure consists of two 32-bit words; the
first is the number of seconds past 1 January 1970 assuming no
intervening leap-second insertions or deletions, while the second
is the number of microseconds within the second.

long maxerror (ro)

This member is the value of the time_maxerror kernel variable,
which represents the maximum error of the indicated time relative
to the primary synchronization source, in microseconds. For NTP,
the value is initialized by a ntp_adjtime() call to the
synchronization distance, which is equal to the root dispersion
plus one-half the root delay. It is increased by a small amount
(time_tolerance) each second to reflect the maximum clock frequency
error. This variable is provided bu a ntp-adjtime() system call and
modified by the kernel, but is otherwise not used by the kernel.

long esterror (ro)

This member is the value of the time_esterror kernel variable,
which represents the expected error of the indicated time relative
to the primary synchronization source, in microseconds. For NTP,
the value is determined as the root dispersion, which represents
the best estimate of the actual error of the system clock based on
its past behavior, together with observations of multiple clocks
within the peer group. This variable is provided bu a ntp-adjtime()
system call, but is otherwise not used by the kernel.

4.2. The ntp_adjtime() System Call

The syntax and semantics of the ntp_adjtime() call are given in the
following fragment of the timex.h header file. Note that, as in the
ntp_gettime() system call, the syscall.h system header file must be
modified to define the SYS_ntp_adjtime system call specific to each
system type. In the fragment, rw = read/write, ro = read-only, wo =
write-only.

/*
* NAME
* ntp_adjtime - NTP daemon application interface
*
* SYNOPSIS
* #include <sys/timex.h>
*
* int system call(SYS_ntp_adjtime, mode, tptr)
*
* int SYS_ntp_adjtime defined in syscall.h header file
* struct timex *tptr pointer to timex structure
*
* NTP daemon interface - used to discipline kernel clock
* oscillator
*/
struct timex {
unsigned int mode; /* mode selector (wo) */
long offset; /* time offset (us) (rw) */
long frequency; /* frequency offset (scaled ppm) (rw)
*/
long maxerror; /* maximum error (us) (rw) */
long esterror; /* estimated error (us) (rw) */
int status; /* clock status bits (rw) */
long constant; /* pll time constant (rw) */
long precision; /* clock precision (us) (ro) */
long tolerance; /* clock frequency tolerance (scaled
* ppm) (ro) */
/*
* The following read-only structure members are implemented
* only if the PPS signal discipline is configured in the
* kernel.
*/
long ppsfreq; /* pps frequency (scaled ppm) (ro) */
long jitter; /* pps jitter (us) (ro) */
int shift; /* interval duration (s) (shift) (ro)
*/
long stabil; /* pps stability (scaled ppm) (ro) */
long jitcnt; /* jitter limit exceeded (ro) */
long calcnt; /* calibration intervals (ro) */
long errcnt; /* calibration errors (ro) */
long stbcnt; /* stability limit exceeded (ro) */
};

The ntp_adjtime() system call is used to read and write certain time-
related kernel variables summarized below. Writing these variables can
only be done in superuser mode. To write a variable, the mode structure
member is set with one or more bits, one of which is assigned each of
the following variables in turn. The current values for all variables
are returned in any case; therefore, a mode argument of zero means to
return these values without changing anything.

Following is a description of the timex structure members.
mode (wo)

This is a bit-coded variable selecting one or more structure
members, with one bit assigned each member. If a bit is set, the
value of the associated member variable is copied to the
corresponding kernel variable; if not, the member is ignored. The
bits are assigned as given in the following, with the variable name
indicated in parens. Note that the precision, tolerance and PPS
variables are determined by the kernel and cannot be changed by
ntp_adjtime().

MOD_OFFSET 0x0001 time offset (offset)
MOD_FREQUENCY 0x0002 frequency offset (frequency)
MOD_MAXERROR 0x0004 maximum time error (maxerror)
MOD_ESTERROR 0x0008 estimated time error (esterror)
MOD_STATUS 0x0010 clock status (status)
MOD_TIMECONST 0x0020 pll time constant (constant)
MOD_CLKB 0x4000 set clock B
MOD_CLKA 0x8000 set clock A

Note that the MOD_CLKA and MOD_CLKB bits are intended for those
systems where more than one hardware clock is available for backup,
such as in Tandem Non-Stop computers. Presumably, in such cases
each clock would have its own oscillator and require a separate PLL
for each. Refinements to this model are for further study. The
interpretation of these bits is as follows:

offset (rw)

If selected, this member specifies the time adjustment, in
microseconds. The absolute value must be less than MAXPHASE
(128000) microseconds defined in the timex.h header file. On
return, this member contains the residual offset remaining between
a previously specified offset and the current system time, in
microseconds.

frequency (rw)

If selected, this member replaces the value of the time_frequency
kernel variable. The value is in ppm, with the integer part in the
high order 16 bits and fraction in the low order 16 bits. The
absolute value must be in the range less than MAXFREQ (100) ppm
defined in the timex.h header file.

The time_freq variable represents the frequency offset of the CPU
clock oscillator. It is recalculated as each update to the system
clock is determined by the offset member of the timex structure. It
is usually set from a value stored in a file when the
synchronization daemon is first started. The current value is
usually retrieved via this member and written to the file about
once per hour.

maxerror (rw)

If selected, this member replaces the value of the time_maxerror
kernel variable, in microseconds. This is the same variable as in
the ntp_getime() system call.

esterror (rw)

If selected, this member replaces the value of the time_esterror
kernel variable, in microseconds. This is the same variable as in
the ntp_getime() system call.

int status (rw)

If selected, this member replaces the value of the time_status
kernel variable. This variable controls the state machine used to
insert or delete leap seconds and shows the status of the
timekeeping system, PPS signal and external oscillator, if
configured.

STA_PLL 0x0001 enable PLL updates (rw)
STA_PPSFREQ 0x0002 enable PPS freq discipline (rw)
STA_PPSTIME 0x0004 enable PPS time discipline (rw)
STA_FLL 0x0008 select FLL mode (rw)

STA_INS 0x0010 insert leap (rw)
STA_DEL 0x0020 delete leap (rw)
STA_UNSYNC 0x0040 clock unsynchronized (rw)
STA_FREQHOLD 0x0080 frequency hold (rw)

STA_PPSSIGNAL 0x0100 PPS signal present (r)
STA_PPSJITTER 0x0200 PPS signal jitter exceeded (r)
STA_PPSWANDER 0x0400 PPS signal wander exceeded (r)
STA_PPSERROR 0x0800 PPS signal calibration error (r)
STA_CLOCKERR 0x1000 clock hardware fault (r)

The interpretation of these bits is as follows:

STA_PLL set/cleared by the caller to enable PLL updates

STA_PPSFREQ set/cleared by the caller to enable PPS frequency
discipline

STA_PPSTIME set/cleared by the caller to enable PPS time
discipline

STA_FLL set/cleared by the caller; set selects FLL mode,
clear selects PLL mode.

STA_INS set by the caller to insert a leap second at the end
of the current day; cleared by the caller after the
event

STA_DEL set by the caller to delete a leap second at the end
of the current day; cleared by the caller after the
event

STA_UNSYNC set/cleared by the caller to indicate clock
unsynchronized (e.g., when no peers are reachable)

STA_FREQHOLD set/cleared by the caller to disable frequency
update.

STA_PPSSIGNAL set/cleared by the hardpps() fragment to indicate
PPS signal present

STA_PPSJITTER set/cleared by the hardpps() fragment to indicates
PPS signal jitter exceeded

STA_PPSWANDER set/cleared by the hardpps() fragment to indicates
PPS signal wander exceeded

STA_PPSERROR set/cleared by the hardpps() fragment to indicates
PPS signal calibration error

STA_CLOCKERR set/cleared by the external hardware clock driver to
indicate hardware fault

An error condition is raised when (a) either STA_UNSYNC or
STA_CLOCKERR is set (loss of synchronization), (b) STA_PPSFREQ or
STA_PPSTIME is set and STA_PPSSIGNAL is clear (loss of PPS signal),
(c) STA_PPSTIME and STA_PPSJITTER are both set (jitter exceeded),
(d) STA_PPSFREQ is set and either STA_PPSWANDER or STA_PPSERROR is
set (wander exceeded). An error condition results in a system call
return code of TIME_ERROR.

constant (rw)

If selected, this member replaces the value of the time_constant
kernel variable. The value must be between zero and MAXTC (6)
defined in the timex.h header file.

The time_constant variable determines the bandwidth or "stiffness"
of the PLL. The value is used as a shift between zero and MAXTC
(6), with the effective PLL time constant equal to a multiple of (1
<< time_constant), in seconds. For room-temperature quartz
oscillators, the recommended default value is 2, which corresponds
to a PLL time constant of about 900 s and a maximum update interval
of about 64 s. The maximum update interval scales directly with the
time constant, so that at the maximum time constant of 6, the
update interval can be as large as 1024 s.

Values of time_constant between zero and 2 can be used if quick
convergence is necessary; values between 2 and 6 can be used to
reduce network load, but at a modest cost in accuracy. Values above
6 are appropriate only if an precision external oscillator is
present.

precision (ro)

This is the current value of the time_precision kernel variable in
microseconds.

The time_precision variable represents the maximum error in reading
the system clock, in microseconds. It is usually based on the
number of microseconds between timer interrupts (tick), 10000 us
for the SunOS and HP-UX kernels, 3906 us for the Ultrix kernel, 976
us for the OSF/1 kernel. However, in cases where the time can be
interpolated between timer interrupts with microsecond resolution,
such as in the stock SunOS and HP-UX kerneld and modified Ultrix
and OSF/1 kernels, the precision is specified as 1 us. In cases
where a PPS signal or external oscillator is available, the
precision can depend on the operating condition of the signal or
oscillator. This variable is determined by the kernel for use by
the synchronization daemon, but is otherwise not used by the
kernel.

tolerance (ro)

This is the current value of the time_tolerance kernel variable.
The value is in ppm, with the integer part in the high order 16
bits and fraction in the low order 16 bits.

The time_tolerance variable represents the maximum frequency error
in ppm of the particular CPU clock oscillator and is a property of
the hardware; however, in principle it could change as result of
the presence of external discipline signals, for instance.

The recommended value for time_tolerance MAXFREQ (200) ppm is
appropriate for room-temperature quartz oscillators used in typical
workstations. However, it can change due to the operating condition
of the PPS signal and/or external oscillator. With either the PPS
signal or external oscillator, the recommended value for MAXFREQ is
100 ppm.

The following members are defined only if the PPS_SYNC option is
specified in the kernel configuration file. These members are useful
primarily as a monitoring and evaluation tool. These variables can be
written only by the kernel.

ppsfreq (ro)

This is the current value of the pps_freq kernel variable, which is
the CPU clock oscillator frequency offset relative to the PPS
discipline signal. The value is in ppm, with the integer part in
the high order 16 bits and fraction in the low order 16 bits.

jitter (ro)

This is the current value of the pps_jitter kernel variable, which
is the average PPS time dispersion measured by the time-offset
median filter, in microseconds.

shift (ro)

This is the current value of the pps_shift kernel variable, which
determines the duration of the calibration interval as the value of
1 << pps_shift, in seconds.

stabil (ro)

This is the current value of the pps_stabil kernel variable, which
is the average PPS frequency dispersion measured by the frequency-
offset median filter. The value is in ppm, with the integer part in
the high order 16 bits and fraction in the low order 16 bits.

jitcnt (ro)

This is the current value of the pps_jitcnt kernel variable, counts
the number of PPS signals where the average jitter exceeds the
threshold MAXTIME (200 us).

calcnt (ro)

This is the current value of the pps_calcnt kernel variable, which
counts the number of frequency calibration intervals. The duration
of these intervals can range from 4 to 256 seconds, as determined
by the pps_shift kernel variable.

errcnt (ro)

This is the current value of the pps_errcnt kernel variable, which
counts the number of frequency calibration cycles where (a) the
apparent frequency offset is greater than MAXFREQ (100 ppm) or (b)
the interval jitter exceeds tick * 2.

stbcnt (ro)

This is the current value of the pps_discnt kernel variable, which
counts the number of calibration intervals where the average
stability exceeds the threshold MAXFREQ / 4 (25 ppm).

[...]