Time zones, DST and gaps in monitoring

(This article was originally posted on the “monitoring” version of my site. I’m re-posting it here for posterity.)

I opened a ticket about this, and ultimately it ended up as a feature request. Before I go tilting at this particular windmill, let me frame the problem, which sits in the middle of multiple pollers, time zones, and DST.

  1. Pollers enter data into SW using their current time as the timestamp (except for a very small number of fields that use UTC, but they are extremely limited)
  2. Thus, if you have pollers in multiple places, you effectively can’t correlate the timing of events because they will appear to have happened hours apart.
  3. Even worse, if you have pollers in location A polling devices in location B, you are that much further away from understanding the real time something happened.NOTE: we have exactly that. 4 pollers in EST,4 in CST, and we aren’t fastidious about assigning only CST devices to the CST pollers. Plus, we have devices in China, Puerto Rico, etc.where we have NO pollers.
  4. Then there’s DST. We noticed during the last daylight savings’ time change that we suddenly had a 1 hour gap in data. Not because we stopped collecting but because everything “jumped” an hour in the future.
  5. Worse, we have a couple of alerts that are time-aware. Meaning “IF a node doesn’t update in 30 minutes, create a ticket”. Suddenly 700 systems appeared to not have updated in 1 hour.

My first solutioin – setting all pollers to the same time zone – resolves most of the issues with 1-3. But it doesn’t fix the DST shift. What would solve that would be to set all the servers to a non-DST time zone like UTC. Standing in the way of that is:

  • The time setting for your primary poller can’t be more than 5 minutes off from that of the database server (per SW tech support). Our database cluster hosts multiple other applications and thus we can not change the time on that system for love, money or a loaded gun.
  • NOTHING in any of the actual SolarWinds displays (graphs, charts, etc) indicate the time zone. So (if we had an independant database server) we could move to UTC, but we would be answering “why does the chart say the error occurred at 2am” type questions until all 30,000 employees at my company had heard the answer at least 5 times each.

So forewarned is forearmed. Right now there is no way to resolve this except for the kissing-your-sister level answer of picking a “real” time zone, and bracing yourself for data gaps and tickets during each daylight savings shift.

%d bloggers like this: