Hi All,
I'm having troubles stabalizing time among ESX servers which all use the same time source. NTP is configured to use a network device as its time source (Juniper NetScreen) and this all works and syncs as aspected for a couple of days. After that time starts to drift slowly. I've tested several timesources, but every time the NTP client embedded in the ESX v5.1 build 799733 fails to update and poll its peer.
e.g. ntpd is set to sync with 11.22.33.44
using tcpdump-uw -c 5 -n -i vmk0 host 11.22.33.44 and port 123 shows good output
using watch "ntpq -p localhost" shows nice statistics like this
Every 2s: ntpq -p localhost 2013-07-23 12:24:04
remote refid st t when poll reach delay offset jitter
==============================================================================
*11.22.33.44 .LOCL. 1 u 121 128 377 6.265 -1008.1 378.536
But after a couple of days the when (last poll value) value keeps increasing without polling its peer
Every 2s: ntpq -p localhost 2013-07-23 12:11:50
remote refid st t when poll reach delay offset jitter
==============================================================================
11.22.33.44 .LOCL. 1 u 61d 1024 377 3.648 27994.3 925.134
Notice the when value which has increased up to 61 days since its last poll. It shows an offset of almost 30 seconds while for real the time is more than 15 minutes behind.
using ntpdc -c loopinfo shows a watchdog timer that is no longer increasing (it should), like the whole daemon is just freezing. The only way to correct this is restarting the NTP daemon.
This repeats itself every couple of days, give or take a few.
Any thoughts on this wierd behaviour? Anyone else experiencing the same issue?
Kind regards,
Arnold Veenema