Want to Future-Proof Your IT Environment? Stay Out of Technical Debt and Unleash the Chaos Monkey

(“In Case You Missed It Monday” is my chance to showcase something that I wrote and published in another venue, but is still relevant. This week’s post was co-written with my fellow Head-Geek Thomas LaRock, and originally appeared on VMBlog)

Your first car may have had lousy seats, so you created makeshift seat covers. The engine wasn’t strong enough, but it was fine because you gunned it when it went uphill. You had other problems with the car, but you had workarounds-until the vehicle was more duct tape than steel.

Your workplace IT environment is probably like this too. You have sort-of okay equipment on deck and you’re getting by. It’s good enough for now, but it’s kept running by a great deal of additional work.

The result: You’re not future-proofed. You’re also running from crisis to crisis.

Understand Your Technical Debt

When you’re making do, you’re in what we refer to as “technical debt.” Technical debt is where you make do with code or IT products good enough for now. It should be said, being in technical debt isn’t always bad-that is, if you make sure to fix that jury-rigged code before it’s a crisis.

Paying down technical debt isn’t glamourous. On a day-to-day level, it can mean taking developer time away from adding new software features to go back through the code to eliminate shortcuts. In the moment, it feels like a delay. To the boss, it can look like wasting time. 

In my experience, most organizations ignore their technical debt. It’s understandable. It doesn’t feel like progress to go through a lot of tedious code. But paying down technical debt is progress. You’re killing cyclical panic. When you’re in technical debt, you run with what you’ve got-until it breaks. And when it breaks, you panic. Because your broken thing didn’t fail on a set schedule, you’re now on Red Alert and you have a short repair cycle. 

So, you find a stopgap. You put a finger in the dike and solve the problem. But the stopgap is placed on the top of several other stopgaps because you haven’t been paying down your technical debt.

Eventually, you experience another breakage, another panic, another stopgap. You’re in a cyclical panic.

Don’t Rely on Tribal Knowledge

Cyclical panic causes another problem. Because of the many ad hoc fixes, processes become dependent on tribal knowledge.

Tribal knowledge exists for a variety of reasons, and it’s in every office. As a process, it can mean you put requisitions in a spreadsheet and then print them out. The procedure says you hand them to some designated person in the office who then inputs the data. But, they eventually leave and the entire process collapses. No one understands how the system broke down. No one knows how to fix it, either.

But change comes slow. Recently, I met someone at a conference, and we spent a while talking about advances in networking switches (as one does-or at least I do). At the end of the conversation, he said, “Sounds great. But I have four more years on my core switches. That’s more than my budget for two years.”

His switches were already 10 years outside of the end of life. Not end of support; end of life. But he hadn’t depreciated his equipment, yet, so he was hanging on to them.

Depreciation is the enemy of technical debt. I realize you can’t have something new and fancy all the time-at a certain point, you’re going to stick with what you have. However, future-proofing calls for a delicate balance. It’s about finding the best technology capable of solving bona fide business problems. It’s about getting the most future bang for your buck.

Of course, that’s a delicate line to walk. It even has psychological factors, like learning to let go.

Crush the Concorde

When we don’t let go, we’re in danger of falling victim to the sunk-cost fallacy. Also called the Concorde Fallacy, this logic flaw happens when we think investments (or sunk costs) justify further expenditures.

To understand the concept, think of the fallacy’s namesake, the supersonic plane, the Concorde. The British and French governments funded it for 27 years, long after it was apparent there wasn’t an economic case for it.

Think of it as throwing good money after bad. In your case, you spent so much money on IT equipment you want to maintain it and get as much use out of it as you can-even though the equipment has long outlived its usefulness.

But if you get new equipment, how should you make sure you’ve broken through and futureproofed your equipment?

Unleash the Chaos Monkey

Whether you’re setting up a data center or establishing a team, you should ask yourself, “How flexible is this thing?” And when the next new thing comes along, how easy will it be for you to upgrade?

One way to find out is to unleash the chaos monkey.

In programmer terms, a “chaos monkey” is a program capable of either randomly turning things off or deleting them. It messes things up, on purpose. It tests your software, your equipment, and staff. The idea is to figure out how quickly you can recover from a hit-as a wargame, of course.

The chaos monkey can also kill tribal knowledge by exposing it to everyone. I know companies where the chaos monkey will decide individual team members must go home-regardless of deadlines or projects. If everything stops because the team member is gone: You have too much tribal knowledge and you’re not future-proofed.

You Can’t Just Buy Your Way Into Future-Proofing

Even if you’re in some server-room utopia where vendors have provided you with future-proofed products, you may still not be future-proofed.

That’s because future-proofing isn’t about having the latest product from every vendor-those products may not talk to each other. 

Each product must play well with the products you use. This is because it’s going to be the people and equipment in accounting working with the people and equipment in security and sales-and so on, endlessly. It’s going to be the network team working in concert with the server and storage and developers. Products from Brand X, Brand Y, and Brand Z will have to work together.

That’s why I think the answer lies in connected simplicity.

Simple Product, Intelligence in the Cloud

Before I start, I’m not saying cloud-based technologies are future-proofed. But a product named Cisco Meraki-a line of wireless access points-may point us to where we need to go.

Meraki’s value option is having its intelligence is in the cloud. Its access points, however, are so simple, so flexible any new capabilities are entirely software coded. They don’t require hardware replacement for every upgrade.

Sometimes, you may need a physical replacement. A core switch needs an upgrade-but you won’t need to replace the entire switch-just a single module within the switch.

What the customer receives is simple. But the utility is highly sophisticated.

Future-proofing is about knowing products die. It’s also about making sure those products and programs work as far into the future as possible.

It’s about paying down technical debt. It’s about unleashing the chaos monkey. But it’s also about removing the physical and sociological barriers holding teams back.

%d bloggers like this: