Re: very odd hardware clock behavior

Guest section DW (dwguest@win.tue.nl)
Fri, 10 Dec 1999 21:26:16 +0100


On Thu, Dec 09, 1999 at 05:10:09PM -0500, Jeff DeFouw wrote:

> Ever since I started running a distributed.net client all the time on both
> processors, my hardware clock has been behaving very strangely. After
> booting 2.3.31 yesterday day I noticed that the kernel startup in my logs
> was dated Dec 30, 1996! ntpdate kicks in from the init scripts and fixes
> it. I saved the time with hwclock. However, it doesn't stay that way...
> last I was able to run hwclock it was about 40 minutes off. I also have a
> ton of "set_rtc_mmss: can't update from x to y" in my logs going back to
> the time I started running the distributed.net client. I'm currently
> running 2.3.31 SMP w/ two P3-450's, and I was running 2.3.28 SMP when I
> started running the client. After a while my hwclock will completely
> cease to function (and it has now, uptime is about 19 hours). It will
> just hang no matter what I tell it to do.

Yes - you want to kill this 11 minute mode in the kernel.

Concerning the messages:

Suppose set_rtc_mmss() is called just before the hour.
Say real_minutes=59 and cmos_minutes=0.
What happens?

if (abs(real_minutes - cmos_minutes) < 30) {
if (!(save_control & RTC_DM_BINARY) || RTC_ALWAYS_BCD) {
BIN_TO_BCD(real_seconds);
BIN_TO_BCD(real_minutes);
}
CMOS_WRITE(real_seconds,RTC_SECONDS);
CMOS_WRITE(real_minutes,RTC_MINUTES);
} else {
printk(KERN_WARNING
"set_rtc_mmss: can't update from %d to %d\n",
cmos_minutes, real_minutes);
retval = -1;
}

A message in syslog.

Concerning a hwclock hang:
Did you try "hwclock --directisa"?
For many people that works better than accessing the hardware clock
via /dev/rtc and the kernel.

Concerning the "last I was able to run hwclock it was about 40 minutes off":

There is the silly code

if (((abs(real_minutes - cmos_minutes) + 15)/30) & 1)
real_minutes += 30; /* correct for half hour time zone */

(On the one hand, nobody is in a half hour time zone.
On the other hand, last I was in Nepal the time difference
with China and India was an integral number of hours plus/minus
a quarter of an hour. On the third hand, still other time differences,
like 20 minutes, occur in sufficiently obscure places. This += 30
does not belong in the kernel - as this entire 11 minute business
does not belong in the kernel.)

So when things go a bit wrong, the kernel will add 30 minutes to the
error in the hardware clock.

How could things go a bit wrong? Hmm. There is also /etc/adjtime that
tries to follow the drift of the clock. Bad things will happen if
you set up hwclock to use /etc/adjtime and simultaneously have this
11 minute mode. Wild fluctuations may be the result.

Andries

P.S. The code fragments are from linux/arch/i386/kernel/time.c.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/