Re: NTP dumps Linux, film at 11. [Fwd/FYI]

Riley Williams (rhw@bigfoot.com)
Wed, 2 Dec 1998 13:28:07 +0000 (GMT)


Hi Ted.

>> It was very noticeable. I upgraded the entire cluster (14
>> machines) in HH1202 one afternoon; half the machines started
>> gaining time like mad, the other half lost time like mad. All
>> machines identical, indeed from the same manufacturing batch as
>> far as I can tell (and all idle, the spring semester having just
>> ended at the time). Within 15 minutes all of them were outside of
>> the time window the AFS kaservers were willing to grant them (the
>> usual 5 minutes for Kerberos) and linux-afs had lost the battle to
>> resync their clocks. (10 minutes later xntpd was a standard part
>> of our Linux install....)

> Fascinating... as I said, I haven't seen this at all, and I've been
> using RedHat 5.x quite frequently. The one big difference is (1) I
> don't use a 2.0 kernel, and (2) I don't run Linux-AFS. [(2) is
> related to (1). :-/ ]

> Have you seen systems lose time coherency without running
> Linux-AFS? The AFS code attempts to do time synchronization, and
> I'm wondering if that's buggy somehow. This sounds very much like a
> kernel issue, but you seem to have observed it being tied to RH5,
> which seems hard to believe, unless there's some certain RedHat
> package which you're installing which is mucking with the time
> somehow.

> I can also ask the people who setup Linux-Athena at MIT if they've
> ever seen something like that. Linux-Athena is currently RH 5.0
> based, using Linux-AFS. Since I'm on the SIPB lists, I'd imagine I
> would have heard something about this, but it's possible that I
> might have missed it.

I can report on the following RH systems, none of which have ever run
Linux-AFS. The first three are my own, the last two are at a school
whose networking I set up over the summer.

1. Intel P166 based, RH 5.0 since upgraded to RH 5.1. The system
loses approximately 5 minutes a month if left to itself, but this
is accounted for via the adjtime facility, which was calibrated
using rdate.

2. Intel 486dx2/66, RH 5.1 freshly installed, gained 3 seconds over
the last week, and will be calibrated similar to the above after
one month in use. No sign of any racing clock.

3. Intel 386dx/33, RH 5.0 based, serves as network print server. The
hardware clock loses 2 seconds a day, but this is dealt with by
the adjtime facility.

4. AMD K6/133, RH 5.1 clean installed in late August. This system
has been running non-stop since then, and has so far gained less
than a second without any adjustments whatsoever. As a result,
this system serves as the school's timeserver, and gets time
requests from both PC's running Win9x and assorted AppleMacs.

5. Intel 386sx/16, 2x130M HD, 8M RAM, serves as firewalled router.
This system gains approximately 12 MINUTES A DAY left to itself,
which is far too much for adjtime to cope with. As a result, it
runs rdate hourly to resync its clock to system (4) above.

Based on this, it appears that the problem is in the Linux-AFS code...

Best wishes from Riley.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/