The Cost of (Not) Monitoring

Over on the SolarWind’s “GeekSpeak” blog, I published a detailed analysis of how to measure the cost and value of monitoring. But I wanted to also post the long-form version of the story that drove the analysis. That’s what appears below:

What does a wireless thermometer have in common with ping? Both can keep a business from losing cash.

It should surprise nobody that one of the ways that business stay in business is by keeping a tight reign on costs. So it also comes as no surprise that convincing companies to spend money on a monitoring solution can be… a challenge. To the average executive (and – let’s be honest – the mid-level manager listening to the technical ramblings of an excited but fiscally vague IT Pro) monitoring seems like pure sunk cost with no return.

Part of that is due to the aforementioned technical ramblings. I’ll deal with that in another article. Here I would like to address the impression that monitoring has no return on investment. It really just takes a single question:

How much will NOT monitoring cost you?

A glib and inexperienced bean counter will tell you that NOT monitoring is free.

Here’s one experience I had that may help illustrate the cost of monitoring versus the cost that NOT monitoring can have.

A friend of mine, a chef that managed food services for a 300 bed hospital, approached me to set up monitoring for his coolers/freezers. They were older units and he’d had some food freeze up when they dropped lower than their setting. He wanted a way to know what the temp was without walking into each freezer with his own thermometer and standing there waiting for 2 minutes.

To help him out, I suggested a setup that involved a few specialized temperature sensors, cabling, and a PC in the kitchen manager’s office running monitoring software and displaying the results.

The total cost was about $5,000 and would have allowed staff to see the current temperature in each of the coolers and freezers, as well as receive a notification (email or SMS) if the temperature was out of the acceptable range.

The administration declined, saying it was too expensive just to know that a freezer was 5 degrees too cold.

So in this case, the cost OF monitoring was a cool (forgive the joke) $5,000.

Needless to say, something happened. One of the staff left the door to the main cooler open, causing the compressor to run all evening, until 2:00am. That’s when the compressor died. At 6am the first shift got in, only to find all the food in that cooler was spoiled.

Remember, this is a hospital. Food is not prepared-to-order. It’s prepped and cooked to 75% completion, stored in the cooler, and then heated the rest of the way before serving. For any given meal, food is usually prepped a day and a half ahead.

This failure affected the next 5 upcoming food services for 300 patients—from breakfast through lunch the following day. 1,500 meals were now in the dumpster. My friend the chef was notified and he began making short-notice orders to his food suppliers, calling in extra staff, and contacting a repair service for an emergency visit. His team worked long and frantic hours to get breakfast and lunch out the door, and begin re-prepping all the food they would need to catch up.

If you’ve worked in IT for more than 15 minutes, this should all sound hauntingly familiar, even if you are the kind of cook who can burn water. Critical application failures (and the firedrill which comes soon after) follow a similar script.

So what was the total cost of the outage, including the food that had to be thrown out, the replacement food, the equipment repair, and additional staff ? A cool $1million. 200 times more than the cost of the monitoring system deemed to be “too expensive.”

Now we all understand that the suggested monitoring system wouldn’t have averted the failu… wait. No, I take that back. It probably would have. At the end of that fateful shift when the staffer left the door open, the temperature would have started rising until it exceeded a threshold. At that point, the food in the cooler would still have been OK as long as something was done quickly. My friend would have gotten a notification, someone could have been called to check the cooler, the door would have been closed, the compressor wouldn’t have died, and a $1 million charge would have been avoided.

As IT Professionals, it behooves us to be able to explain – in clear terms that non-technical staff can understand – what is intuitively obvious to those of us in the trenches: knowing the cost of NOT monitoring is an important (and often over-looked) facet of the decision on what to mention and which tool to use.

%d bloggers like this: