Archive for May, 2005

Handling clock drift

All modern computers have clocks… But, to save money, makers use very cheap hardware, which makes for inaccurate clocks… Now, if you have NTP to sync to an atomic clock somewhere, why does it matter if your clock is drifting? Because things like “SysUpTime” don’t get corrected when NTP updates the system clock on a router, for instance.

Lets just say that I’ve found that a typical Cisco 2651XM routers have a clock drift of about 5 minutes every 4 to 6 months or so…

Using SNMP (via some snazzy PHP scripts I wrote), I’m pulling the SysUpTime from about 950 Cisco 2651XM routers every day. I’m calculating back to find out when the router rebooted and comparing that to a database. To account for small differences in time related to network latency, my original routines allowed as much a 5 minutes of difference between the calculated reboot time and the reboot time in the database… If there is more than that, it is assumed that the router rebooted and the calculated time is placed in the database so we can easily keep track of the last time the router rebooted… (There’s an audit table too, that lets us track all the database changes, but I’ll save that for another post)… If the difference between the old database value and the new calculated time is less than 5 minutes, I assumed there was a small amount of variation due to how busy the circuits were, etc. and I ignore it, leaving the old value in the database. Oh, and an email is broadcast out to people interested in these routers when ones are found to have rebooted or had other changes applied.

After running for a few months, I noticed that I started getting unusual info is these alert emails… For example, a few days ago the email told me that the new “RouterRebootTime” was something like Nov 25th, 2004 at 09:04, and that the old “RouterRebootTime” was Nov 25th, 2004 and 08:59. Now, remember, months had passed since November. Other routers listed in the same email had reboot date/timestamps for the current week (Let’s say this started in March)…

At least once a week I saw these… Then, it seemed to pick up to a few times a week… Recently, I was getting like one or two of these a day. The further we get away from the original calculated time, the more machines are showing up how badly they drift…

Anyhow, I changed my methodology for handling these…. If the new calculated reboot time is over a week old, I’m now checking that value against the database value… If there is a difference of 24 hours or more, then I’m updating the database… If the calculated reboot time is less than a week old, I’m looking for the 5 minute difference, just like I used to…

So, the possibility will still exist that I’ll get these “false alarms” in my email, but since it will have to be 24 hours off, and so far I’ve only seem differences of 5 minutes in several months, I imagine I won’t see too many of these any time soon… In fact, I imagine that sysUpTime will roll over before there is 24 hours of drift.

May 5, 2005 at 4:11 pm Leave a comment


May 2005

Posts by Month

Posts by Category