Welcome!

AdatoSystems is committed to helping organizations leverage systems management tools to increase stability, reliability, and value.

In plain English, we:

  • Install, Improve and Integrate monitoring systems and software (Monitoring)
  • Design and Deploy websites (Web Design)
  • Automatically Backup and Update your WordPress website (SiteButler)
  • keep your “IT stuff” running with as little cost and effort as possible

A few of the amazing teams we’ve been privileged to work with can be found on our Customers Page. But the real story is what we can do for YOU.

There is nothing cookie-cutter about your organization, so our approach is anything but. We’ll find out your business goals, your technical challenges, and present you with options on how we can help.

Read More…

#FeatureFriday: Improving Alerts with Query Execution Plans

Welcome to “Feature Friday”, a series of (typically short) videos which explain a feature, function, or technique.

Alerts are, for many monitoring engineers, the bread-and-butter of their job. What many fail to recognize is that, regardless of how graphical and “Natural English Language” the alert builder appears, what you are really creating is a query. Often it is a query which runs frequently (every minute or even more) against the entire database.

Because of that, a single, poorly-constructed query can have a huge (and hugely negative) impact on overall performance. Get a few bad eggs, and the rest of the monitoring system – polling, display, reports, etc – can grind to a crawl, or even come to a halt.

Luckily there’s a tool which can help you discover a query’s execution performance, and identify where the major bottlenecks are.

In the video below my fellow Solarwinds Head Geeks and I channel our inner SQLRockstar and dive into query execution plans and how to apply that technique to SolarWinds Orion alerts. 


For more insights into monitoring, as well as random silliness, you can follow me on Twitter (@LeonAdato) or find me on the SolarWinds THWACK.com forums (@adatole)

#FeatureFriday: All About Diagnostics, Baselines, and Dependencies in SolarWinds Orion

Welcome to “Feature Friday”, a series of (typically short) videos which explain a feature, function, or technique.

After a quick reminder that running diagnostics is NOT just for when you are in trouble, my discussion with fellow SolarWinds Head Geeks Kong Yang and Patrick Hubbard turns to when and how to enable baseline calculations – which allow the system to use collected metrics to build a model of what is “normal” – and automatic dependencies – which suppress alerts on devices downstream of the root cause.


For more insights into monitoring, as well as random silliness, you can follow me on Twitter (@LeonAdato) or find me on the SolarWinds THWACK.com forums (@adatole)

#FeatureFriday: Verifying and Fixing Permissions in SolarWinds Orion

Welcome to “Feature Friday”, a series of (typically short) videos which explain a feature, function, or technique.

Every once in a while I come across an installation that has inexplicable problems – little hiccoughs here and there. Not enough for the monitoring admin to throw up their hands in frustration, but enough to make people scratch their heads. Also not enough to make those admins open a ticket or even mention it to me when we’re talking casually.

Usually I (and the support team) hear about it, however, during upgrades. Because that’s when stuff just doesn’t work as expected. And often, this is because of permissions.

Before you even watch the video, let me lay some SolarWinds Orion wisdom on you: install as local administrator. Not admin equivalent. Not “Joe who has admin priviledges”. Not even DOMAIN admin. Local admin. Doing the install as anything else is going to cause you problems somewhere down the road.

BUT… if this is the first time you are hearing about this (because nobody reads the admin guide), what do you do with the install you already have in place? The answer to that lies in the video below, where my fellow Solarwinds Head Geeks Kong Yang, Patrick Hubbard, and I talk about how to verify and fix permission issues in Orion:


For more insights into monitoring, as well as random silliness, you can follow me on Twitter (@LeonAdato) or find me on the SolarWinds THWACK.com forums (@adatole)

Blueprint: The Evolution of the Network, Part 2

NOTE: This article originally appeared here.

If you’re not prepared for the future of networking, you’re already behind.

That may sound harsh, but it’s true. Given the speed at which technology evolves compared to the rate most of us typically evolve in terms of our skillsets, there’s no time to waste in preparing ourselves to manage and monitor the networks of tomorrow. Yes, this is a bit of a daunting proposition considering the fact that some of us are still trying to catch up with today’s essentials of network monitoring and management, but the reality is that they’re not really mutually exclusive, are they?

In part of one this series, I outlined how the networks of today have evolved from those of yesteryear, and what today’s new essentials of network monitoring and management are as a consequence. By paying careful attention, you will likely have picked up on ways the lessons from the past that I described helped shape those new essentials.

Similarly, today’s essentials will help shape those of tomorrow. Thus, as I said, getting better at leveraging today’s essentials of network monitoring and managing is not mutually exclusive from preparing for the networks of tomorrow.

Before delving into what the next generation of network monitoring and management will look like, it’s important to first explore what the next generation of networking will look like.

On the Horizon

Above all else, one thing is for certain: We networking professionals should expect tomorrow’s technology to create more complex networks resulting in even more complex problems to solve. With that in mind, here are the top networking trends that are likely to shape the networks of the future:

Networks growing in all directions
Fitbits, tablets, phablets and applications galore. The explosion of IoT, BYOD, BYOA and BYO-everything else is upon us. With this trend still in its infancy, the future of connected devices and applications will be not only about the quantity of connected devices, but also the quality of their connections tunneling network bandwidth.

But it goes beyond the gadgets end users bring into the environment. More and more, commodity devices such as HVAC infrastructure, environmental systems such as lighting, security devices and more all use bandwidth—cellular or WiFi—to communicate outbound and receive updates and instructions inbound. Companies are using, or planning to use, IoT devices to track product, employees and equipment. This explosion of devices that consume or produce data will, not might, create a potentially disruptive explosion in bandwidth consumption, security concerns and monitoring and management requirements.

IPv6 eventually takes the stage…or sooner (as in now!)
Recently, ARIN was unable to fulfill a request for IPv4 addresses because the request was greater than the contiguous blocks available. Meanwhile, IPv6 is now almost always enabled by default and is therefore creating challenges for IT professionals even if they, and their organizations, have committed to putting off their own IPv6 decisions. The upshot of all this is that IPv6 is a reality today. There is an inevitable and quickly approaching moment when switching over will no longer be an option, but a requirement.

SDN and NFV will become the mainstream
Software defined networking (SDN) and network function virtualization (NFV) are just in their infancy and should be expected to become mainstream in the next five to seven years. With SDN and virtualization creating new opportunities for hybrid infrastructure, a serious look at adoption of these technologies is becoming more and more important.

So long WAN Optimization, Hello ISPs
There are a number of reasons WAN technology is and will be kicked to the curb in greater fervency. With bandwidth increases outpacing CPU and custom hardware’s ability to perform deep inspection and optimization, and with ISPs helping to circumvent the cost and complexities associated with WAN accelerators, WAN optimization will only see the light of tomorrow in unique use cases where the rewards outweigh the risks. As most of us will admit, WAN accelerators are expensive and complicated, making ISPs more and more attractive. Their future living inside our networks is certainly bright.

Farewell L4 Firewalling 
With the mass of applications and services moving towards web-based deployment, using Layer 4 (L4) firewalls to block these services entirely will not be tolerated. A firewall incapable of performing deep packet analysis and understanding the nature of the traffic at the Layer 7 (L7), or the application layer, will not satisfy the level of granularity and flexibility that most network administrators should offer their users. On this front, change is clearly inevitable for us network professional, whether it means added network complexity and adapting to new infrastructures or simply letting withering technologies go.

Preparing to Manage the Networks of Tomorrow  

So, what can we do to prepare to monitor and manage the networks of tomorrow? Consider the following:

Understand the “who, what, why and where” of IoT, BYOD and BYOA
Connected devices cannot be ignored. According to 451 Research, mobile Internet of Things (IoT) and Machine-to-Machine (M2M) connections will increase to 908 million in just five years, this compared to 252 million just last year. This staggering statistic should prompt you to start creating a plan of action on how you will manage nearly four times the number of devices infiltrating your networks today.

Your strategy can either aim to manage these devices within the network or set an organizational policy to regulate traffic altogether. Nonprofit IT trade association CompTIA noted in a recent survey, many companies are trying to implement partial and even zero BYOD policies to regulate security and bandwidth issues. Even though policies may seem like an easy fix, curbing all of tomorrow’s BYOD/BYOA is nearly impossible. As such, you will have to understand your network device traffic in incremental metrics in order to optimize and secure them. Even more so, you will need to understand network segments that aren’t even in your direct control, like the tablets, phablets and Fitbits, to properly isolate issues.

Know the ins and outs of the new mainstream 
As stated earlier, SDN, NFV and IPv6 will become the new mainstream. We can start preparing for these technologies’ future takeovers by taking a hybrid approach to our infrastructures today. This will put us ahead of the game with an understanding of how these technologies work, the new complexities they create and how they will ultimately affect configuration management and troubleshooting ahead of mainstream deployment.

Start comparison shopping now
Going through the exercise of evaluating ISPs, virtualized network options and other on-the-horizon technologies—even if you don’t intend to switch right now—will help you nail down your particular requirements. Sometimes, knowing a vendor has or works with technology you don’t need right now, such as IPv6, but might later can and should influence on your decision.

Brick in, brick out
Taking on new technologies can feel overwhelming to those of us with “boots on the ground” because the new technology can often simply seem like one more mouth to feed, so to speak. As much as possible, look for ways that potential new additions will not just enhance, but replace the old guard. Maybe your new real-time deep packet inspection won’t completely replace L4 firewalls, but if it can reduce them significantly—while at the same time increasing insight and the ability to respond intelligently to issues—then the net result should be a better day for you. If you don’t do this, then more times than not, new technology will indeed simply seem to increase workload and do little else. This is also a great measuring stick to identify new technologies whose time may not yet have truly come just yet, at least not for your organization.

At a more basic layer, if you have to replace three broken devices and you realize that the newer equipment is far more manageable or has more useful features, consider replacing the entire fleet of old technology even if it hasn’t fallen apart yet. The benefits of consistency often far outweigh the initial pain of sticker shock.

To conclude this series, my opening statement from part one merits repeating: learn from the past, live in the present and prepare for the future. The evolution of networking waits for no one. Don’t be left behind.

#FeatureFriday: Understanding (and fixing) Logging Levels

Welcome to “Feature Friday”, a series of (typically short) videos which explain a feature, function, or technique.

Within SolarWinds Orion (the framework which provides a common set of functions like reporting and alerting; and which also provides the glue to bind together all the different modules) one of the more robust ways to understand what is happening under the hood is the log files. But all the information you COULD see is not there by default. So SolarWinds provides a way to tweak those logging levels.

Unfortunately, over time the logging levels end up out of whack with what you need on a daily basis.

In the video below SolarWinds Head Geeks Kong Yang, Patrick Hubbard, and I dig into where those logs are, what they contain, how to change your logging levels, and how to put them back to the “right” way when you are done.


For more insights into monitoring, as well as random silliness, you can follow me on Twitter (@LeonAdato) or find me on the SolarWinds THWACK.com forums (@adatole)

Change is Good… For Other People

Things really do stay the same more than they change.

And I’ll argue that they do so because we want them to stay the same. When you are responsible for monitoring thousands of devices, and you’ve built a career on your guru-like expertise in a particular toolset, the last thing you want is for everything (or even part of everything) to change radically.

If you are a MIB wizard, your worst fear may be that everything goes to REST API calls. If you’ve spent years learning the in’s and out’s of a vendors database the last thing you want to hear is that they’re moving to noSQL.

So how will we respond to the pressures of IoT, SDN, hybrid cloud? Heck, how are we responding to the pressure of BYOD?

Are you going to try to tackle it with more of the same old?

Or is it finally time to re-think the way YOU do things, and let the vendors catch up to you for once.

Blueprint: The Evolution of the Network, Part 1

NOTE: This article originally appeared here.

Learn from the past, live in the present and prepare for the future.

While this may sound like it belongs hanging on a high school guidance counselor’s wall, they are words to live by, especially in IT. They apply perhaps to no other infrastructure element better than the network. After all, the network has long been a foundational building block of IT, it’s even more important today than it was in the days of SAGE and ARPANET, and its importance will only continue to grow in the future while simultaneously becoming more complex.

For those of us charged with maintaining the network, it’s valuable to take a step back and examine the evolution of the network. Doing so helps us take an inventory of lessons learned—or the lessons we should have learned; determine what today’s essentials of monitoring and managing networks are; and finally, turn an eye to the future to begin preparing now for what’s on the horizon.

Learn from the Past

Think back to the time before the luxuries of Wi-Fi and the proliferation of virtualization, and before today’s wireless and cloud computing.

The network used to be defined by a mostly wired, physical entity controlled by routers and switches. Business connections were based on T1 and ISDN, and Internet connectivity was always backhauled through the data center. Each network device was a piece of company-owned hardware, and applications operated on well-defined ports and protocols. VoIP was used infrequently, and anywhere connectivity—if even a thing—was provided by the low-quality bandwidth of cell-based Internet access.

With this yesteryear in mind, consider the following lessons we all (should) have learned that still apply today:

It Has to Work
Where better to start than with a throw back to IEEE RFC1925, “The Twelve Networking Truths”? It’s just as true today as it was in 1996—if your network doesn’t actually work, then all the fancy hardware is for naught. Anything that impacts the ability of your network to work should be suspect.

The Shortest Distance Between Two Points is Still a Straight Line
Wired or wireless and MPLS, EIGRP or OSPF, your job as a network engineer is still fundamentally to create the conditions where the distance between the provider of information, usually a server, and the consumer of that information, usually a PC, is as near to a straight line as possible. When you forget that but still get caught up in quality of service maps, automated functions and fault-tolerance, you’ve lost your way.

An Unconfigured Switch is Better than the Wizard
It was a long-standing truth that running the configuration wizard on a switch was the fastest way to break it, whereas just unboxing and plugging it in would work fine. Wizards are a fantastic convenience and come in all forms, but if you don’t know what the wizard is making convenient, you are heading for trouble.

What is Not Explicitly Permitted is Forbidden
No, this policy it’s not fun and it won’t make you popular. And it will actually create work for you on an ongoing basis. But there is honestly no other way to run your network. If espousing this policy will get you fired, then the truth is you’re going to get fired one way or the other. You might as well be able to pack your self-respect and professional ethics into the box along with your potted fern and stapler when the shoe drops. Because otherwise that huge security breach is on you.

Live in the Present 

Now let’s fast forward and consider the network of present day.

Wireless is becoming ubiquitous—it’s even overtaking wired networks in many instances—and the number of devices wirelessly connecting to the network is exploding (think Internet of Things). It doesn’t end there, though—networks are growing in all directions. Some network devices are even virtualized, resulting in a complex amalgam of the physical, the virtual and the Internet. Business connections are DSL/cable and Ethernet services, and increased use of cloud services is stretching Internet capacity at remote sites, not to mention opening security and policy issues since it’s not all backhauled through the data center. BYOD, BYOA, tablets and smartphones are prevalent are creating bandwidth capacity and security issues. Application visibility based on port and protocol is largely impossible due to applications tunneling via HTTP/HTTPS. VOIP is common, also imposing higher demands on network bandwidth, and LTE provides high-quality anywhere connectivity.

Are you nostalgic for the days of networking yore yet? The complexity of today’s networking environment underscores that while lessons of the past are still important, a new set of network monitoring and management essentials is necessary to meet the challenges of today’s network administration head on. These new essentials include:

Network Mapping
While perhaps a bit back-to-basics and also suitable as a lesson we all should have learned by now, when you consider the complexity of today’s networks and network traffic, network mapping and the subsequent understanding of management and monitoring needs has never been more essential than it is today. Moving ahead without a plan—without knowing the reality on the ground—is a sure way to make the wrong choices in terms of network monitoring based on assumptions and guesswork.

Wireless Management
The growth of wireless networks presents new problems, such as ensuring adequate signal strength and that the proliferation of devices and their physical mobility—potentially hundreds of thousands of network-connected devices, few of which are stationary and many of which may not be owned by the company (BYOD)—doesn’t get out of hand. What’s needed are tools such as wireless heat maps, user device tracking, over-subscribed access points and tracking and managing device IP addresses.

Application Firewalls
When it comes to surviving the Internet of Things, you first must understand that all of the “things” connect to the cloud. Because they’re not coordinating with a controller on the LAN, each device incurs a full conversation load, burdening the WAN and every element in a network. And worse, many of these devices prefer IPv6, meaning you’ll have more pressure to dual-stack all of those components. Application firewalls can untangle device conversations, get IP address management under control and help prepare for IPv6. They can also classify and segment device traffic; implement effective quality of service to ensure that critical business traffic has headroom; and of course, monitor flow.

Capacity Planning
Nobody plans for not growing; it’s just that sometimes infrastructure doesn’t read the plan we’ve so carefully laid out. You need to integrate capacity for forecasting tools, configuration management and web-based reporting to be able to predict scale and growth. There’s the oft-quoted statistic that 70 percent of network outages come from unexpected network configuration changes. Admins have to avoid the Jurassic Park effect—unexpected, but what in hindsight were clearly predictable outages is the bane of any IT manager’s existence. “How did we not know and respond to this?” is a question nobody wants to have to answer.

Application Performance Insight
Many network engineers have complained that the network would be stable if it weren’t for the end users. While it’s an amusing thought, it ignores the universal truth of IT—everything we do is because of and for end-users. The whole point of having a network is to run the business applications end-users need to do their jobs on. Face it, applications are king. Technologies such as deep packet inspection, or packet-level analysis, can help you ensure the network is not the source of application performance problems.

Prepare for the Future

Now that we’ve covered the evolution of the network from past to present—and identified lessons we can learn from the network of yesterday and what the new essentials of monitoring and managing today’s network are—we can prepare for the future. So, stay tuned for part two in this series to explore what the future holds for the evolution of the network.

#FeatureFriday: Checking and Changing Polling Cycles and Retention

Welcome to “Feature Friday”, a series of (typically short) videos which explain a feature, function, or technique.

One of the biggest challenges when maintaining a monitoring system is handling the sheer volume of data. While your initial thought may be “keep it all”, the reality is that this has implications with regard to everything from display speed (because data has to be culled from the database and loaded onto the screen and if the data set is large, the query time will be long) to storage (leading to the question “if you were the IT pro who had everything, where would you put it all?”).

The secret to effectively managing this issue lies in understanding and being able to tune polling cycles and retention times, which is what I discuss with my fellow SolarWinds Head Geeks Kong Yang, Patrick Hubbard in the video below.


For more insights into monitoring, as well as random silliness, you can follow me on Twitter (@LeonAdato) or find me on the SolarWinds THWACK.com forums (@adatole)

It’s Not About What Happened

“I call bullshit,” he said with authority. “it it couldn’t have happened like that. Let’s move on to something real.”

And just like that, he missed the most important part.

Because sometimes – maybe often – what you need to know is not whether something really, actually, 100% accurately happened “that way”.

It’s about what happened next. More to the point, it’s about what people did about the thing that may-or-may-not-have-happened-that-way.

How we respond to events (real or perceived) tells us and others more about who we are than the situations we find ourselves in.

“The entire data center was crashing”
(it was only 10 servers)
“the CEO was calling my cell every 2 minutes”
(he called twice in the first 30 minutes and then left you alone)
“it was a massive hack, probably out of China or Russia”
(it was a mis-configured router)

Whatever. I’m not as interested in that as what you did next. Did you:

  • Call a vendor and scream that they needed to “fix this yesterday”?
  • Pull the team together and solicit ideas and put together a plan
  • tell everyone to stay out of your way and worked 24 hours to sort it out?
  • Wait 30 minutes before doing anything to see if anyone noticed, or if it sorted itself out?
  • Start documenting everything you saw happening, to review afterward?
  • Simply shut everything down and start it up again and see if that fixes it?
  • Look at your historical data to see if you can spot the beginning of the failure?
  • Immediately recover from backups, and let people know work will be lost?

Notice that most of those aren’t inherently wrong, although several are wrong depending on the specific circumstances.

And that is the ONLY point where “what happened” comes into play. The events around us shape our environment.

But how we decide to respond shapes who we are.

When It Comes to System Outages, Don’t Prepare For the Worst

NOTE: This article originally appeared here.

During the 2015 World Cup soccer competition, Nate Silver and the psychic witches he keeps in his basement — because how else could he make the predictions he does with such accuracy? — got it wrong. Really, really wrong. They were completely blindsided by Germany’s win over Brazil. As Silver described it, it was a completely unforeseeable event.

 

In sports and, to a lesser extent, politics, the tendency in the face of these things is to eat the loss, chalk it up to a fluke — a black swan in statistics parlance — and get on with life.

But as network administrators, we know that’s not how it works in IT.

In my experience, when a black swan event affects IT systems, management usually acquires a dark obsession with the event. Meetings are called under the guise of “lessons learned exercises,” with the express intent of ensuring said system outages never happen again.

Don’t spend too much time studying what might occur

Now, I’m not saying that after a failure we should just blithely ignore any lessons that could be learned. Far from it, actually. In the ashes of a failure, you often find the seeds of future avoidance. One of the first things an IT organization should do after such an event is determine whether the failure was predictable, or if it was one of those cases where there wasn’t enough historical data to determine a decent probability.

 

If the latter is the case, I’m here to tell you your efforts are much better spent elsewhere. What’s a better approach? Instead of spending time trying to figure out if a probability may or may not exist, catch and circumvent those common, everyday IT annoyances. This is a tactic that’s overlooked far too often.

Don’t believe me? Well, let’s take the example of a not-so-imaginary company I know that had a single, spectacular IT failure that cost somewhere in the neighborhood of $100,000. Management was understandably upset. It immediately set up a task force to identify the root cause of the failure and recommend steps to avoid it in the future. Sounds reasonable, right?

The task force — five experts pulled from the server, network, storage, database and applications teams — took three months and more than 100 staff-hours to investigate the root cause. Being conservative, let’s say the hourly cost to the company was $50. Now, multiply that by five people, then by 100 hours, then by three months. It comes to a nice round $125,000.

Not so reasonable after all

Yes, at the end of it all the root problem was not only identified — at least, as much as possible — but code was put in place to (probably) predict the next time the exact same event might occur. Doesn’t sound so bad. But keep this in mind: The company spent $25,000 more than the cost of the original failure to create a system outages solution that may or may not predict the occurrence of a black swan exactly like the one that hit before.

Maybe it wasn’t so reasonable after all.

You may be thinking, “But where else are you saying we should be focusing on? After all, we’re held accountable to the bottom line as much as anyone else in the company.”

I get that, and it’s actually my point. Let’s compare the previous example of chasing a black swan to another, far more common problem: network interface card (NIC) failures.

In this example, another not-so-fictitious company saw bandwidth usage spike and stay high. NICs threw errors until the transmission rates bottomed out, and eventually the card just up and died. The problem was that while bandwidth usage was monitored, there was no alerting in place for interfaces that stopped responding or disappeared (the company monitored the IP at the end of the connection, which meant WAN links were absent alerts until the far end went down).

Let’s assume that a NIC failure takes an average of one hour to notice and correctly diagnose, and then it takes two hours to fix by network administrators who cost the company $53 per hour. While the circuit is out, the company loses about $1,000 per hour in revenue, lost opportunity, etc. That means system outages like this one could cost the company $3,106.

Setting a framework anchored by alerting and monitoring

Now, consider that, in my experience, proper monitoring and alerting reduces the time it takes to notice and diagnose problems such as NIC failures to 15 minutes. That’s it. Nothing else fancy, at least not in this scenario. But that simple thing could reduce the cost of the outage by $750.

I know those numbers don’t sound too impressive. That is, until you realize a moderately sized company can easily experience 100 NIC failures per year. That translates to more than $300,000 in lost revenue if the problem is unmonitored, and an annual savings of $75,000 if alerting is in place.

And that doesn’t take into account the ability to predict NIC failures and replace the card pre-emptively. If we estimate that 50% of the failures could be avoided using predictive monitoring, the savings could rise to more than $190,000.

Again, I’m not saying preparing for black swan events isn’t a worthy endeavor, but when tough budget decisions need to be made, some simple alerting on common problems can save more than trying to predict and prevent “the big one” that may or may not ever happen.

After all, NIC failures are no black swan. I think even Nate Silver would agree they’re a sure thing.