ICYMI: Data, Information, Action

(“In Case You Missed It Monday” is my chance to showcase something that I wrote and published in another venue, but is still relevant. This week’s post originally appeared on DataCenterJournal.com)

One of the benefits of being a 30-year veteran of IT is that I’m able to appreciate certain consistencies within the industry. And sometimes, that enjoyment comes with a generous helping of cynicism.

For example, the term “big data” is about as buzzword-worthy as it gets right now, listing up there with the other darlings of IT vendor pitches. Of course, big data has been around a lot longer than the time it took to make it to marketing departments’ “but-how-can-we-use-this-to-sell-our-stuff?” lists. In fact, it goes back to the mid-1990s.

However, this is not an article about the history of the term “big data.”

The reason I bring up the genesis of the term is for context. At the time John Mashey and his friends were sitting around the Silicon Graphics lunch table and applied the term to our current understanding of data collection and storage, the cost to store just one terabyte of data was around $280,000. That’s assuming, of course, you had an array that could fit that many drives—remember, it was in 1997 that IBM released its “Titan” hard drive, which featured 5 3.5” platters holding, what was then, a whopping 16 gigabytes of data.

So I have to wonder, again with a generous serving of cynicism and snark, how big the big data really was in their minds?

Nevertheless, it was commonly understood—and had been long before the advent of computers themselves—that data and information were two very different animals. Data, whether small, medium, or economy-sized, could generally be found everywhere and collected for the price of a pencil and piece of paper. Information, however, required a little more effort.

Getting back to present day, the saying, “you can have data without information, but you cannot have information without data,” may never have been so blindingly obvious or true. We are awash in seas of data, fed by thundering, swollen tributaries like the Internet of Things, mobile computing and social media. The goal of big data, then, is to channel those raging rivers into meaningful insight.

However, this is not an article about the goals or opportunities presented by big data, either.

For almost 20 years, my specialty within the field of IT has been systems monitoring and management. Those who share my passion for finding ever newer and more creative ways to determine when, how, and if a server went bump in the night understand that data versus information is not really a dichotomy. It’s a triad.

Of course good monitoring starts with data. Lots of it, collected regularly from a variety of devices, applications and sources across the data center. And of course transforming that data into meaningful information—charts, graphs, tables and even speedometers—that represent the current status and health of critical services is the work of the work

But unless that information leads to action, it’s all for naught. And that, patient reader, is what this article is about—the importance of taking that extra step to turn data-driven insight into actionable behavior. What is surprising to me is how often this point is overlooked. Let me explain:

Let’s say you diligently set up your monitoring to collect hard drive data for all of your critical servers. You’re not only collecting disk size and space used, but you also pull statistics on IOPS, read errors and write errors.

That’s Data.

Now, let’s say your sophisticated and robust monitoring technology goes the extra mile, not only converting those metrics to pretty charts and graphs, but also analyzing historical data to establish baselines so that your alerts don’t just trigger when, for example, disk usage is over 90 percent, but rather, for example, when disk usage jumps 50 percent over normal for a certain time period.

That’s Information.

Now, let’s say you roll that monitoring out to all 5,000 of your critical servers and begin to “enjoy” about 375 “disk full” tickets per month.

That, sadly, is the normal state of affairs at most companies. It’s the point where, as a monitoring engineer (or, at the very least, the person in charge of the server monitoring), you begin to notice the dark looks and poorly hidden sneers from colleagues who have had “your” monitoring wake them one too many times at 2 a.m.

So, what’s missing? The answer is found in a simple question: Now what? As in, once you and the server team have hashed out the details of the disk full alert, the next thing you should do is ask, “What should we do now? What’s out next step?” In this case, it would likely involve clearing the temp directory to see if that resolves the issue.

And the next logical step from there is automation. Often, the same monitoring platform that kicks up a fuss about a server being down at 2 .m. can clear that nasty old temp directory for you. Right then and there, all while you’re still sound asleep. Then, if and only if, the problem persists, will a ticket be cut so a human can get involved. And said human will know that before their precious beauty sleep was so rudely interrupted, the temp directory had already been cleared, so it’s something just a bit more sophisticated than that.

This type of automated action is neither difficult to understand nor super complicated to establish. But in the environments where I’ve personally implemented it, the result was a whopping 70 percent reduction in disk full tickets.

And the benefits of this type of thinking go beyond the reduced tickets. In fact, that reduction represents hours of time—productive work hours or valuable overtime—saved. It also represents trust regained, as systems monitoring becomes a source of insight rather than interruption and a force multiplier because of its ability to react quickly and reliably at all hours.

It also shows the power of action to amplify information. By removing easily resolved incidents from the event stream, it’s now possible to see the forest despite the trees and detect more complex failure patterns. And this, in turn, means more chances for action in the future.

Because after 30 years in IT, I’ve acquired, along with my healthy serving of cynicism, the sure knowledge that there is always more work to be done.

%d bloggers like this: