blu.org  wiki

monitoring NTP

View: New views
7 Messages — Rating Filter:   Alert me  

monitoring NTP

by Tom Metro-16 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

I rebooted a MythTV server coincidentally when local DNS happened to not
be working, and when ntpd started, it wasn't able to resolve the address
of the server it was set to synchronize with. Looking at the logs, it
says it gave up after a few tries, which is disappointing.

Even if it had resumed trying later, it isn't clear that it would have
corrected the system time (which was off by hours, due to a dead
motherboard battery), as a manual restart of ntpd didn't resolve the
problem. I had to stop it (otherwise there is a port conflict), run
ntpdate, then restart ntpd. I thought ntpd stepped the time if there was
a large delta. (According to /etc/default/ntp the -g option is being
specified, which is supposed to permit nptd to make large steps when
initially started.)

This same Ubuntu system once had a problem where ntpd mysteriously
exited (probably when a libc or similar update was installed, which
kills and restarts services), so I'm thinking its time to put some
monitoring in place, but I'd like something fairly light weight. A
script or maybe a monit config. Though simply checking that ntpd is
running wouldn't do it. It needs to periodically check the the delta
between itself and another server and complain when a threshold is
exceeded. That seems to be beyond what monit can do.

  -Tom

--
Tom Metro
Venture Logic, Newton, MA, USA
"Enterprise solutions through open source."
Professional Profile: http://tmetro.venturelogic.com/
_______________________________________________
Discuss mailing list
Discuss@...
http://lists.blu.org/mailman/listinfo/discuss

Re: monitoring NTP

by Dan Ritter-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Thu, Jul 02, 2009 at 08:11:45PM -0400, Tom Metro wrote:
> This same Ubuntu system once had a problem where ntpd mysteriously
> exited (probably when a libc or similar update was installed, which
> kills and restarts services), so I'm thinking its time to put some
> monitoring in place, but I'd like something fairly light weight. A
> script or maybe a monit config. Though simply checking that ntpd is
> running wouldn't do it. It needs to periodically check the the delta
> between itself and another server and complain when a threshold is
> exceeded. That seems to be beyond what monit can do.

You need your monitor to parse the output of

ntpq -n -c rv $host

and compare to local time.

I have such a thing, but it's written for mon, not monit.

-dsr-



--
http://tao.merseine.nu/~dsr/eula.html is hereby incorporated by reference.

You can't defend freedom by getting rid of it.
_______________________________________________
Discuss mailing list
Discuss@...
http://lists.blu.org/mailman/listinfo/discuss

Re: monitoring NTP

by Richard Pieri :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Jul 2, 2009, at 8:11 PM, Tom Metro wrote:
> ntpdate, then restart ntpd. I thought ntpd stepped the time if there  
> was
> a large delta. (According to /etc/default/ntp the -g option is being
> specified, which is supposed to permit nptd to make large steps when
> initially started.)


-g is the sanity check.  If the system time is more than -g's value  
off (default 1000 seconds) then ntpd says "see ya!" and quits.  
Setting it to 0 should prevent ntpd from quitting.  running ntpdate as  
you did before restarting ntpd accomplishes the same thing but does it  
much faster.  ntpdate sets the time now whereas ntpd with a large  
clock skew will take a while to sync up.

--Rich P.

_______________________________________________
Discuss mailing list
Discuss@...
http://lists.blu.org/mailman/listinfo/discuss

Parent Message unknown Re: monitoring NTP

by Rich Braun :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Tom Metro <tmetro-blu@...> asked:
> [ntpd monitoring script] needs to periodically check the the delta
> between itself and another server and complain when a threshold is
> exceeded.

I use the output of 'ntpdc -p' to verify that the local server is in sync with
at least one non-local one, which catches 99% of problems.

Below is a simple perl script which works with Nagios.

-rich

#!/usr/bin/perl  -w
# $Id: check_valid_time.pl,v 1.2 2009/05/13 18:37:43 rbraun Exp $

$ntpdc='/usr/sbin/ntpdc';
$awk='/bin/awk';

$warn_thresh=.75;
$crit_thresh=100;

$ok_exit=0;
$warning_exit=1;
$critical_exit=2;
$unknown_exit=3;
$fields='$1,$7';


my $cmd = "$ntpdc -p | $awk '/*/ {print $fields}'" ;
($auth_server,$offset)=split(' ',`$cmd`);

if ($auth_server =~ m/LOCAL/) {
print "$auth_server server untrusted for NTP check, offset $offset\n";
exit $critical_exit;
}

if ($auth_server eq "") {
print "No auth server exists\n";
exit $critical_exit;
}

$offset=abs($offset);

if ($offset =~ m/[A-Z]/ ) {
print "$offset from $auth_server could not be evaluated.\n";
exit $unknown_exit;
}
elsif ($offset =~ m/[a-d]/ ) {
print "$offset from $auth_server could not be evaluated.\n";
exit $unknown_exit;
}
elsif ($offset =~ m/[f-z]/ ) {
print "$offset from $auth_server could not be evaluated.\n";
exit $unknown_exit;
}
elsif ($offset >= $crit_thresh) {
print "$offset seconds from $auth_server.\n";
exit $critical_exit;
}
elsif ($offset >= $warn_thresh) {
print "$offset seconds from $auth_server.\n";
exit $warning_exit;
}
elsif ($offset >= $crit_thresh) {
print "$offset seconds from $auth_server.\n";
exit $critical_exit;
}
elsif ($offset >= 0) {
print "$offset seconds from $auth_server.\n";
exit $ok_exit;
}
else {
print "Something's wrong, offset=$offset, auth_server=$auth_server\n";
exit $unknown_exit;
}
_______________________________________________
Discuss mailing list
Discuss@...
http://lists.blu.org/mailman/listinfo/discuss

Re: NTP

by Tom Metro-16 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Richard Pieri wrote:
> Tom Metro wrote:
>> I thought ntpd stepped the time if there was a large delta.
>> (According to /etc/default/ntp the -g option is being specified,
>> which is supposed to permit nptd to make large steps when initially
>> started.)
>
> -g is the sanity check.  If the system time is more than -g's value  
> off (default 1000 seconds) then ntpd says "see ya!" and quits.  
> Setting it to 0 should prevent ntpd from quitting.

Setting it to 0? It doesn't appear to take a parameter. The man page says:

   -g  Normally, ntpd exits with a message to the system log if the off-
       set exceeds the panic threshold, which  is  1000  s  by  default.
       This  option  allows  the  time  to  be  set to any value without
       restriction; however, this can happen only once.  If the  thresh-
       old  is exceeded after that, ntpd will exit with a message to the
       system log.  This option can be used with the -q and -x  options.

So without -g, if delta > 1000s it bails. With it, it accepts the big
change, but only once.

It does match my observation that ntpd kept running, it just didn't step
the time, as expected, however, -g doesn't address the step vs. slew
issue, only whether the process considers it a fatal situation.

-x is suggested as the way to address the step vs. slew, but the wording
for that switch is convoluted:

   -x  Normally, the time is slewed if the offset is less than the  step
       threshold,  which  is 128 ms by default, and stepped if above the
       threshold. ...

OK, so that should mean that without -x, a delta greater than 128 ms
will result in a step. That sounds fine, as a big delta (thousands of
seconds) qualifies, and I want it to step, so I shouldn't need to
specify -x, yet this doesn't match the observed behavior.

It goes on:

       ...        This option sets the threshold to  600  s,  which  is
       well within the accuracy window to set the clock manually.

So specifying -x makes ntpd *less* likely to step the time. That's not
what I want, but besides, the delta I experiences was hours, so even if
this switch was specified, it still should have stepped the setting.


> running ntpdate...accomplishes the same thing but does it  
> much faster.

The ntpd man page says:

   -q  Exit  the  ntpd just after the first time the clock is set.  This
       behavior mimics that of the  ntpdate  program,  which  is  to  be
       retired.   The  -g  and  -x options can be used with this option.

So they're suggesting that:
ntpd -q -g

emulates the behavior of ntpdate, but I'm skeptical. Maybe this ntpd is
buggy?


> ntpdate sets the time now whereas ntpd with a large  
> clock skew will take a while to sync up.

If the delta is large enough, both should do the same thing effectively,
(with of course ntpd saying running as a daemon and ntpdate exiting).

The ntpdate man page says:

   Time  adjustments  are  made  by  ntpdate in one of two ways. If
   ntpdate determines the clock is in error more than 0.5  second  it
   will  simply  step the time by calling the system settimeofday()
   routine. If the error is less than 0.5 seconds, it will slew the time
   by  calling  the  system adjtime()  routine.

So again there is a threshold, but in this case it is 1/2 second.

  -Tom

--
Tom Metro
Venture Logic, Newton, MA, USA
"Enterprise solutions through open source."
Professional Profile: http://tmetro.venturelogic.com/
_______________________________________________
Discuss mailing list
Discuss@...
http://lists.blu.org/mailman/listinfo/discuss

Re: NTP

by Richard Pieri :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Jul 3, 2009, at 5:41 PM, Tom Metro wrote:
> Setting it to 0? It doesn't appear to take a parameter. The man page  
> says:

Different version, probably.  Leopard's version says this:

      -g      Normally, ntpd exits if the offset exceeds the sanity  
limit,
              which is 1000 s by default.  If the sanity limit is set  
to zero,
              no sanity checking is performed and any offset is  
acceptable.
              [...]

> -x is suggested as the way to address the step vs. slew, but the  
> wording for that switch is convoluted:

Yeah.  You don't want -x.  You don't have a clock slew problem.  You  
have (had) a dead clock battery problem.  ntpdate is what you really  
need until you get that battery replaced (or until the utility goes  
away and ntpd -q works right).

--Rich P.

_______________________________________________
Discuss mailing list
Discuss@...
http://lists.blu.org/mailman/listinfo/discuss

Re: monitoring NTP

by Tom Metro-16 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Dan Ritter wrote:
> You need your monitor to parse the output of
>
> ntpq -n -c rv $host
>
> and compare to local time.

Thanks.

So you mean parsing the time string out of:

[...]
reftime=cdf8176a.fdf3a0c5  Fri, Jul  3 2009  1:36:42.991, poll=6,
clock=cdf81781.889e625e  Fri, Jul  3 2009  1:37:05.533, state=4,
offset=-11.510, frequency=-19.803, jitter=8.887, noise=5.759,

My first thought was the use offset, but that appears to be the offset
between the remote server and its upstream peer.

The HTML documentation lists the variables, but never really defines
their meaning. I guess they defer to the RFC, which does define them.

Looks like it may be easier to just use Net::NTP in Perl, if it's going
to require a script anyway, and and that approach reduces dependency on
the code being monitored (technically ntpq is separate from ntpd).

For example, this will report the offset between the local machine and a
specified server:

use Net::NTP;
my $SERVER = 'ntp.example.com';
my %response = get_ntp_response($SERVER);
printf "Offset: %.5f s\n", $response{'Transmit Timestamp'} - time;


I'll extend that to compare the delta to a threshold, and log a critical
error through syslog if it is exceeded. Then set the script up to run
occasionally from cron.

  -Tom

--
Tom Metro
Venture Logic, Newton, MA, USA
"Enterprise solutions through open source."
Professional Profile: http://tmetro.venturelogic.com/
_______________________________________________
Discuss mailing list
Discuss@...
http://lists.blu.org/mailman/listinfo/discuss