SUMMARY: NTP won't synchronize anymore

From: Christopher L. Barnard (cbar44@tsg.cbot.com)
Date: Wed Apr 09 1997 - 12:55:33 CDT


>
> We have our own stratum 1 timesync source that is unfortunately very
> old and can't deal with daylight savings time. (We have to manually
> change the hour offset from GMT. ugh). So we shut down NTP on the
> stratum 2 servers that obtain their time from this clock, set the hour
> offset from GMT on the radio clock, wait for DST to start, and then
> start NTP back up again.
>
> Unfortunately it won't restart cleanly. All of our machines have been
> floating freely since Sunday morning, and they refuse to sync up. A
> typical ntpq "peers" command looks like this:
>
> ntpq> pe
> remote refid st t when poll reach delay offset disp
> ===========================================================================
> ntpc3.cbot.com .IRIG. 1 u 26 64 3 2.67 -287.52 7875.81
> srvcbt1.cbot.co LOCAL(1) 2 u 23 64 3 3.74 -0.089 7875.06
> srvcbt2.cbot.co LOCAL(1) 2 u 53 64 3 3.83 385.456 7875.34
> srvagd.info.cbo 0.0.0.0 16 u 40 64 0 0.00 0.000 16000.0
> auddev.audit.cb 0.0.0.0 16 - - 64 0 0.00 0.000 16000.0
> admdev2.admin.c srvcbt2.cbot 3 u 11 64 7 3.25 194.292 7875.99
> srvcon_10.cbot. 0.0.0.0 16 u 54 64 0 0.00 0.000 16000.0
>
> (srvagd, auddev, and srvcon_10 are this machine's peers. They are in the
> same boat that this machine is in). This machine should be binding to
> ntpc3 (a new stratum 1 radio clock that knows how to deal with DST) or
> srvcbt1/srvcbt2 (the stratum 2 servers that get their time from the old
> radio clocks).
>
> I've watched ntpq closely, and what is happening is that after it receives
> four reliable signals from a higher stratum machine (at which point
> it should bind), the delay & offset reset to 0 and the dispersion resets
> to 16000.0. The message
>
> Apr 8 11:56:24 cosmos xntpd[24059]: Previous time adjustment incomplete; residual 0.003749 sec
>
> appears in /var/adm/messages at the same time. What has me baffled is why
> I'm getting a SunOS error message on a Solaris 2.5 machine?! The "previous
> time adjustment" error was a problem under SunOS 4.1.3 and earlier that
> was fixed by running tickadj on startup. This isn't supposed to be necessary
> on Solaris (and hasn't been for the years that NTP has been running...)
> I've got the
>
> set dosynctodr=0
>
> in my /etc/system, and have had it there since day one.
>
> Can anyone give me any ideas as to what is going wrong? Thanks much,
> and summary will be forthcoming...

The solution:

chill.

be patient.

By this morning everything had synched back up. It just took four days
for everyone to get happy, thats all. The next time we have to adjust
our time source like this I'll be sure to blow away the /etc/ntp.drift
file on the high stratum servers, which should help them to get back in
sync much much faster.

Thanks to:

Andy J. Stefancik <ajs6143@eerpf001.ca.boeing.com>
and
Michael Kohne <mhkohne@moberg.com>

+-----------------------------------------------------------------------+
| Christopher L. Barnard O When I was a boy I was told that |
| cbarnard@tsg.cbot.com / \ anybody could become president. |
| (312) 347-4901 O---O Now I'm beginning to believe it. |
| http://www.cs.uchicago.edu/~cbarnard --Clarence Darrow |
+----------PGP public key available via finger or PGP keyserver---------+



This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:11:50 CDT