ICYMI: :Why Do We Put Up With Monitoring Solutions that Hurt?

(“In Case You Missed It Monday” is my chance to showcase something that I wrote and published in another venue, but is still relevant. This week’s post originally appeared on DataCenterJournal.com)

Many years ago, one of my kids was bullied at school, but we didn’t find out about it for a while. As it turns out, things had become pretty bad—being pushed into lockers, having personal stuff messed with, even being roughed up for money a couple times. When the details finally came out, we found a good therapist and sat down to talk about things. In our first session, the therapist asked my child a question I’ll never forget. She said, “Tell me what a great day sounds like.”

My kid’s answer was like a punch to the gut. In a completely normal voice, the response was, “Oh, it would be great if the other kids left me half of my sandwich.” Here was a kid whose day had become so bleak, just being permitted to eat a whole lunch was unimaginable. Just a part of it would be an improvement.

I have discussions at work—about monitoring, of all things—that send me right back to that moment, gut punch and all.

I was consulting with the folks in charge of the network operation center—the ones that respond to issues at 2 a.m.—and gave them my usual spiel. In a breezy, new-age kind of voice (one that might gush about the secret power of amethyst crystals, faeries, and moonbeams) I said, “Tell me your secret monitoring dreams and wishes. What kinds of data would delight you to no end?”

The guy across the table said to me, without a hint of sarcasm, “Oh, it would be great if we could get alerts when nonroutable interfaces went down.”

Dropping the hippy act, I waved him off. “Yeah, of course. But what do you really want?”

“No, really,” he replied, serious and a bit intense. “If you could do that, it would change everything. We’ve been asking for it for five years, but nobody could do it for us.”

That specific conversation happened years ago, but I’ve had versions of it many times since. That first time, I was openly shocked. The second time, I was able to hide my dismay a bit better. Now, several years later, I’m more prepared when the conversation heads in this direction, but no less dismayed. Here’s why.

As IT professionals, we’re unwilling to accept suboptimal performance or function of any kind. We root our phones to get the latest (or unsupported) version; we overclock our systems; we file bug fixes and feature requests with vendors; we exploit backdoors; we memorize Konami-code combinations to access “god mode” in both our games and our workplace tool sets. So why do I so frequently meet otherwise demanding network engineers and sysadmins who passively accept monitoring that completely fails to meet their needs?

The results of the 2018 SolarWinds IT Trends Report: The Intersection of Hype and Performance support my observations. In speaking to over 800 IT practitioners worldwide, a shocking point was that over half indicated their IT environments were failing to perform at optimal levels, while a similar percentage (again, over half) spent less than 25 percent of their time trying to optimize what they had.

It’s 2018. Cloud, containers and hybrid IT are a day-to-day reality; buzzword-worthy innovations such as digital transformation, AI and machine learning are on every executive’s lips and every technical manager’s mind; The Phoenix Project—the book that laid out how (and why) continuous improvement can be achieved—just released a 5th Anniversary Edition. But many data center managers are still being bullied by tools that are not just user unfriendly, but in some cases actively user hostile.

Having dug into this matter, I have some thoughts. First, monitoring implementations typically fall into one of two basic categories: “homegrown” and “corporate project.” In both cases, there’s a strong chance the end result will be less than stellar.

Homegrown monitoring solutions are usually done on the side, on the cheap and on the down-low. Once the one-man implementer has achieved “good enough for now,” he stops. The solution is rarely revisited for improvements (unless it’s broken), and if it monitors systems beyond the individual’s original needs (or the team who instigated the individual), it’s as a favor.

At the other end of the spectrum are corporate-project monitoring solutions. They usually involve some level of management edict, several flashy presentations by desperate vendors in expensive suites, and a selection of a tool that’s neither simple nor cheap and that inevitably becomes associated with one executive’s personal metric for success. Therefore, it’s shoehorned into all sorts of situations that the tool was never meant for because after all, we spent all this budget on it, and gosh darn it, we better get our money’s worth! The teams that are asked (told, strong-armed, etc.) to use the tool seldom have a vested interest themselves and may even have their own solutions (see “homegrown” above) that they feel suit their needs better.

Another aspect for both homegrown and corporate projects is knowledge and scope. It’s hard to get specialists from different teams (systems, network, storage and virtualization, not to mention all the assorted application groups) to agree on any set of monitoring metrics. It’s equally difficult to find monitoring professionals who are conversant in multiple areas so that they can bridge the gap. (In fact, we could just end that sentence with “it’s equally difficult to find monitoring professionals.” But more on that later.) And thus, in both cases, the monitoring options available to the rank-and-file technicians is usually ill fitting, feature sparse, and poorly promoted.

There’s a harsh reality here. Monitoring—solid, effective, robust, reliable monitoring that meets not just the needs but also the demands of multiple teams in an organization—is hard. It typically requires multiple tools, often from multiple vendors (something very few vendors are willing to admit and no salesperson would dare to say). But beyond the politics of it, it’s hard to get the right teams in the room, to have the right conversations, and to commit to the level of expertise both during the implementation phase and the subsequent usage to make any set of solutions truly effective. It’s so difficult that many IT pros have given up on it altogether. They just accept the status quo because they feel too many variables are out of their hands.

But in the same way parents rejects the bullying of their child as “just the way things are,” I categorically refuse to accept that this is an acceptable state for any organization of any budget, any industry, any size. Monitoring, when designed and implemented correctly, can be a powerful force in an organization, enabling it to avoid downtime, performance degradation and cost; it’s also a morale boost to engineers who know they can rely on something beside their own eyes and gut to tell them everything is okay, as well as to quickly and accurately lead them to the root cause of a problem when it’s not.

There is a way to get to this promised land that I’m describing. Stay tuned for my next post to find out how.

Share this:

Like this: