|
View:
New views
7 Messages
—
Rating Filter:
Alert me
|
|
|
monitoring NTPI rebooted a MythTV server coincidentally when local DNS happened to not
be working, and when ntpd started, it wasn't able to resolve the address of the server it was set to synchronize with. Looking at the logs, it says it gave up after a few tries, which is disappointing. Even if it had resumed trying later, it isn't clear that it would have corrected the system time (which was off by hours, due to a dead motherboard battery), as a manual restart of ntpd didn't resolve the problem. I had to stop it (otherwise there is a port conflict), run ntpdate, then restart ntpd. I thought ntpd stepped the time if there was a large delta. (According to /etc/default/ntp the -g option is being specified, which is supposed to permit nptd to make large steps when initially started.) This same Ubuntu system once had a problem where ntpd mysteriously exited (probably when a libc or similar update was installed, which kills and restarts services), so I'm thinking its time to put some monitoring in place, but I'd like something fairly light weight. A script or maybe a monit config. Though simply checking that ntpd is running wouldn't do it. It needs to periodically check the the delta between itself and another server and complain when a threshold is exceeded. That seems to be beyond what monit can do. -Tom -- Tom Metro Venture Logic, Newton, MA, USA "Enterprise solutions through open source." Professional Profile: http://tmetro.venturelogic.com/ _______________________________________________ Discuss mailing list Discuss@... http://lists.blu.org/mailman/listinfo/discuss |
|
|
Re: monitoring NTPOn Thu, Jul 02, 2009 at 08:11:45PM -0400, Tom Metro wrote:
> This same Ubuntu system once had a problem where ntpd mysteriously > exited (probably when a libc or similar update was installed, which > kills and restarts services), so I'm thinking its time to put some > monitoring in place, but I'd like something fairly light weight. A > script or maybe a monit config. Though simply checking that ntpd is > running wouldn't do it. It needs to periodically check the the delta > between itself and another server and complain when a threshold is > exceeded. That seems to be beyond what monit can do. You need your monitor to parse the output of ntpq -n -c rv $host and compare to local time. I have such a thing, but it's written for mon, not monit. -dsr- -- http://tao.merseine.nu/~dsr/eula.html is hereby incorporated by reference. You can't defend freedom by getting rid of it. _______________________________________________ Discuss mailing list Discuss@... http://lists.blu.org/mailman/listinfo/discuss |
|
|
Re: monitoring NTPOn Jul 2, 2009, at 8:11 PM, Tom Metro wrote:
> ntpdate, then restart ntpd. I thought ntpd stepped the time if there > was > a large delta. (According to /etc/default/ntp the -g option is being > specified, which is supposed to permit nptd to make large steps when > initially started.) -g is the sanity check. If the system time is more than -g's value off (default 1000 seconds) then ntpd says "see ya!" and quits. Setting it to 0 should prevent ntpd from quitting. running ntpdate as you did before restarting ntpd accomplishes the same thing but does it much faster. ntpdate sets the time now whereas ntpd with a large clock skew will take a while to sync up. --Rich P. _______________________________________________ Discuss mailing list Discuss@... http://lists.blu.org/mailman/listinfo/discuss |
|
|
|
|
|
Re: NTPRichard Pieri wrote:
> Tom Metro wrote: >> I thought ntpd stepped the time if there was a large delta. >> (According to /etc/default/ntp the -g option is being specified, >> which is supposed to permit nptd to make large steps when initially >> started.) > > -g is the sanity check. If the system time is more than -g's value > off (default 1000 seconds) then ntpd says "see ya!" and quits. > Setting it to 0 should prevent ntpd from quitting. Setting it to 0? It doesn't appear to take a parameter. The man page says: -g Normally, ntpd exits with a message to the system log if the off- set exceeds the panic threshold, which is 1000 s by default. This option allows the time to be set to any value without restriction; however, this can happen only once. If the thresh- old is exceeded after that, ntpd will exit with a message to the system log. This option can be used with the -q and -x options. So without -g, if delta > 1000s it bails. With it, it accepts the big change, but only once. It does match my observation that ntpd kept running, it just didn't step the time, as expected, however, -g doesn't address the step vs. slew issue, only whether the process considers it a fatal situation. -x is suggested as the way to address the step vs. slew, but the wording for that switch is convoluted: -x Normally, the time is slewed if the offset is less than the step threshold, which is 128 ms by default, and stepped if above the threshold. ... OK, so that should mean that without -x, a delta greater than 128 ms will result in a step. That sounds fine, as a big delta (thousands of seconds) qualifies, and I want it to step, so I shouldn't need to specify -x, yet this doesn't match the observed behavior. It goes on: ... This option sets the threshold to 600 s, which is well within the accuracy window to set the clock manually. So specifying -x makes ntpd *less* likely to step the time. That's not what I want, but besides, the delta I experiences was hours, so even if this switch was specified, it still should have stepped the setting. > running ntpdate...accomplishes the same thing but does it > much faster. The ntpd man page says: -q Exit the ntpd just after the first time the clock is set. This behavior mimics that of the ntpdate program, which is to be retired. The -g and -x options can be used with this option. So they're suggesting that: ntpd -q -g emulates the behavior of ntpdate, but I'm skeptical. Maybe this ntpd is buggy? > ntpdate sets the time now whereas ntpd with a large > clock skew will take a while to sync up. If the delta is large enough, both should do the same thing effectively, (with of course ntpd saying running as a daemon and ntpdate exiting). The ntpdate man page says: Time adjustments are made by ntpdate in one of two ways. If ntpdate determines the clock is in error more than 0.5 second it will simply step the time by calling the system settimeofday() routine. If the error is less than 0.5 seconds, it will slew the time by calling the system adjtime() routine. So again there is a threshold, but in this case it is 1/2 second. -Tom -- Tom Metro Venture Logic, Newton, MA, USA "Enterprise solutions through open source." Professional Profile: http://tmetro.venturelogic.com/ _______________________________________________ Discuss mailing list Discuss@... http://lists.blu.org/mailman/listinfo/discuss |
|
|
Re: NTPOn Jul 3, 2009, at 5:41 PM, Tom Metro wrote:
> Setting it to 0? It doesn't appear to take a parameter. The man page > says: Different version, probably. Leopard's version says this: -g Normally, ntpd exits if the offset exceeds the sanity limit, which is 1000 s by default. If the sanity limit is set to zero, no sanity checking is performed and any offset is acceptable. [...] > -x is suggested as the way to address the step vs. slew, but the > wording for that switch is convoluted: Yeah. You don't want -x. You don't have a clock slew problem. You have (had) a dead clock battery problem. ntpdate is what you really need until you get that battery replaced (or until the utility goes away and ntpd -q works right). --Rich P. _______________________________________________ Discuss mailing list Discuss@... http://lists.blu.org/mailman/listinfo/discuss |
|
|
Re: monitoring NTPDan Ritter wrote:
> You need your monitor to parse the output of > > ntpq -n -c rv $host > > and compare to local time. Thanks. So you mean parsing the time string out of: [...] reftime=cdf8176a.fdf3a0c5 Fri, Jul 3 2009 1:36:42.991, poll=6, clock=cdf81781.889e625e Fri, Jul 3 2009 1:37:05.533, state=4, offset=-11.510, frequency=-19.803, jitter=8.887, noise=5.759, My first thought was the use offset, but that appears to be the offset between the remote server and its upstream peer. The HTML documentation lists the variables, but never really defines their meaning. I guess they defer to the RFC, which does define them. Looks like it may be easier to just use Net::NTP in Perl, if it's going to require a script anyway, and and that approach reduces dependency on the code being monitored (technically ntpq is separate from ntpd). For example, this will report the offset between the local machine and a specified server: use Net::NTP; my $SERVER = 'ntp.example.com'; my %response = get_ntp_response($SERVER); printf "Offset: %.5f s\n", $response{'Transmit Timestamp'} - time; I'll extend that to compare the delta to a threshold, and log a critical error through syslog if it is exceeded. Then set the script up to run occasionally from cron. -Tom -- Tom Metro Venture Logic, Newton, MA, USA "Enterprise solutions through open source." Professional Profile: http://tmetro.venturelogic.com/ _______________________________________________ Discuss mailing list Discuss@... http://lists.blu.org/mailman/listinfo/discuss |
| Free embeddable forum powered by Nabble | Forum Help |
