“How good are we, really?” It’s a question that engaged data center managers will ask themselves. Hopefully, it’s not one asked obsessively every day, but from time to time during moments of quiet introspection. It should be on the list of Deep Thoughts that one asks.
As with most things in IT, asking the question is important. But obtaining a correct answer is where the “work of the work” happens.
“You Keep Using That Word”
Before I dig into this issue, let me step back into the Deep Thoughts territory and put forth (and then address) one more question: What does it mean to be mature and why would anyone want that for the data center?
Although the answer may seem obvious at first, I suggest that a “good enough” data center may be sufficient. If, as Malcom Gladwell proposes in Outliers, it costs 10,000 hours to become a recognized expert and leader in a field, what is the relative cost to simply be competent? While true world-class excellence is admirable, it’s also often unnecessary. I don’t need to be a world-class bowler; I just have to be good enough to enjoy a night out with friends. That doesn’t cost me much at all.
Likewise, if we equate maturity with excellence, and the cost to have a fully mature (i.e., world class) data center is exorbitantly high—regardless of whether we’re talking about cost in money, time, staff or some other parameter—then another question we have to ask is, “Does a world-class data center serve the needs of the business?” Often, the answer will be no. You just need to be good enough for your consumers—internal colleagues and external customers—to receive the expected level of service.
So, I want to be clear that that when I say mature, I don’t use it as a synonym for “perfect”; rather, I mean sufficiently stable and robust to meet the needs of your business and be maintainable by your available staff. And understanding that—how capable your data center is in providing effective services and how maintainable is that level of service with the staff at hand—should be important to anyone who manages or works in a data center environment.
The Capability Maturity Model
Luckily, there is a model for determining maturity, called the capability maturity model (CMM). The CMM has been used for everything from software development, which is where it was born, to product delivery and skyscraper construction.
Unfortunately, like Information Technology Infrastructure Library (ITIL), service-oriented architecture (SOA) and Six Sigma, the CMM is often shoehorned into uses where it’s less than optimal. After all, running a data center is little like developing a software application. To be clear, I’m not saying ITIL, SOA, Six Sigma or CMM are bad frameworks. They are brilliant and immensely useful. But they aren’t useful in all situations.
So, although understanding your relative level of data center maturity is important, using the CMM to do so is suboptimal. Thus, I’d like to present my thoughts on maturity models in general and how you can create one that is meaningful, relevant and effective for your particular environment.
Creating a Data Center Maturity Model All Your Own
The goal of a data center maturity model is to help you understand where your data center stands on a continuum, where at one end there is no sophistication and at the other is fully optimal. That means you need to think carefully about the categories that matter.
For example, cleanliness is certainly an important category in many situations, from operating rooms to commercial kitchens, but it may not be a crucial metric for a data center. Organization, on the other hand, is a good start. But your next thought must be, “Organization of what?” Are you only talking about the physical aspects, such as having all your spare cables color coded for use, sorted by length and readily available? Or does the idea of organization extend to manuals, tools, equipment, staff schedules, process reviews, and even online FAQs and knowledge bases?
Coming up with three to five categories shouldn’t be that hard—just think about the types of issues, tasks and activities you deal with every day. But as you think about those details, make every effort to group them in ways that emphasize the higher level discipline. Examples include the following:
- Visualization: can you identify and view the status of all of the aspects of your data center? This covers everything from floor and rack maps—static or interactive—to monitoring that shows the current state of hardware, software, transactions and more.
- Capacity: do you know how much gas is in your tank? Do you know how fast you are burning it? Can you calculate when you’re likely to be empty on the basis of your current speed as well as the speed you normally drive at various times of the day, week and month? Fuel analogy aside, these questions apply equally well to storage, processors, memory, load balancing and more.
- Responsiveness: when a problem occurs, how do you become aware of it? What tools are in place to facilitate the initial reaction—including automated remediation, escalation, troubleshooting and ultimately keeping the mean time to repair (MTTR) low? Where is historic and tribal knowledge kept? Is it in people’s heads, in a book, online or in ticket history?
Once you have a set of categories, you can come up with questions that allow you to assess your maturity/sophistication/preparation in those areas. You should avoid questions that have yes, no or SAT-style essay answers. Opt instead for questions that elicit answers indicating a rank. Typically, I aim for questions or statements that facilitate answers with one to five results. Then structure those answers so that one is at the poor end of the spectrum, but put your answer for good enough in the middle, not at the optimal end. I’ll explain the reason in a bit. Examples include:
- When I go to bed at night, I’m confident that I can see the following fraction of what’s happening in my infrastructure:
- 0–25% (I can never sleep!)
- 26–50%
- 51–75%
- 76–95%
- 96–100%
- Our tools help us reduce MTTR (compared with not having those tools) by roughly
- 0–5% (What’s MTTR?)
- 6–25%
- 26–50%
- 51–75%
- 76–100%
- We stay on top of capacity challenges by
- Observing system crashes
- Each staff member keeping mental tabs on their pet or assigned systems
- Checking systems at regular intervals, documenting the data and drawing our own conclusions
- Using data to create a simple straight-line projection of overall usages (trending up, down or flat)
- Using continuous automatic data collection to calculate per-element baselines, which we use to project when resources may run out as well as set alert thresholds on the basis of “normal” rather than a fixed number
Once you have your questions and answers fine-tuned, send them around to your team. Make the answers anonymous if you think doing so will encourage more-thoughtful and honest responses, but everyone should be eager to help improve the environment.
Once you get answers back, rate the results on a consistent scale; for example, use one to five, or use percentages. Then you can track the average result for each question but also make sure you roll up to a single, final average for each category.
Good Enough
So why did I recommend putting your good-enough answers in the middle? Because, hopefully, you’re better than good enough in at least a couple of categories. I believe so because you are reading The Data Center Journal, so you have at least a passing interest in data center capabilities, and you’ve probably made your data center a better place simply by virtue of caring about it.
I also feel strongly that despite the skewing of the American educational system, a C grade does indeed mean average, not “you stink.” And in many situations, as I discussed at the start of this column, in practice, good enough often really is good enough.
Finally, my reasoning also has to do with how you will present the results. Despite the fact that each of the questions—if you follow my examples—yields a tidy five-step ladder, such as the one presented in typical CMM model, that’s not how you should show the results. Instead, it should look something like the following:
Source: SolarWinds
The benefit of this type of display is that you can see where strengths in one area are helping to make up for gaps in others. Would you—or more realistically, your hypercompetitive upper management—like to see that whole radar filled? Of course—we all want to be superheroes. But we are usually unwilling to accept the cost that getting to all fives would entail, whether it means newer equipment, more-robust monitoring, larger staff, higher tiers of vendor SLA and so on. I take that back. More-robust monitoring does not have to cost a lot, so you should always try to have the best tools in place!
In Closing
Obviously, creating a maturity model and analyzing the data isn’t the same as actually improving your data center. And in some cases, the answer to “how can we improve?” may be obvious. But having an assessment tool that maps to a maturity model creates a repeatable process that allows you to not only to identify areas for improvement, but also assess your progress along the way.
And having well-documented, repeatable processes is a big part of data center maturity in the first place.