Category Archives: SolarWinds

IT Monitoring Scalability Planning: 3 Roadblocks

Planning for growth is key to effective IT monitoring, but it can be stymied by certain mindsets. Here’s how to overcome them.

As IT professionals, planning for growth is something we do all day almost unconsciously. Whether it’s a snippet of code, provisioning the next server, or building out a network design, we’re usually thinking: Will it handle the load? How long until I’ll need a newer, faster, or bigger one? How far will this scale?

Despite this almost compulsive concern with scalability, there are still areas of IT where growth tends to be an afterthought. One of these happens to be my area of specialization: IT monitoring. So, I’d like to address growth planning (or non-planning) as it pertains to monitoring by highlighting several mindsets that typically hinder this important, but often surprisingly overlooked element, and showing how to deal with each.

The fire drill mindset
The occurs when something bad has already happened either because there was either no monitoring solution in place or because the existing toolset didn’t scale far enough to detect a critical failure, and so it was missed. Regardless, the result is usually a focus on finding a tool that would have caught the problem that already occurred, and finding it fast.

However, short of a TARDIS, there’s no way to implement an IT monitoring tool that will help avoid a problem after it occurs. Furthermore, moving too quickly as a result of a crisis can mean you don’t take the time to plan for future growth, focusing instead solely on solving the current problem.

My advice is to stop, take a deep breath, and collect yourself. Start by quickly, but intelligently developing a short list of possible tools that will both solve the current problem and scale with your environment as it grows. Next, ask the vendors if they have free (or cheap) licenses for in-house demoing and proofs of concept.

Then, and this is where you should let the emotion surrounding the failure creep back in, get a proof-of-concept environment set up quickly and start testing. Finally, make a smart decision based on all the factors important to you and your environment. (Hint: one of which should always be scalability.) Then implement the tool right away.

The bargain hunter
The next common pitfall that often prevents better growth planning when implementing a monitoring tool is the bargain-hunter mindset. This usually occurs not because of a crisis, but when there is pressure to find the cheapest solution for the current environment.

How do you overcome this mindset? Consider the following scenario: If your child currently wears a size 3 shoe, you absolutely don’t want to buy a size 5 today, right? But you should also recognize that your child is going to grow. So, buying enough size 3 shoes for the next five years is not a good strategy, either.

Also, if financials really are one of the top priorities preventing you from better preparing for future growth, remember that the cheapest time to buy the right-sized solution for your current and future environment is now. Buying a solution for your current environment alone because “that’s all we need” is going to result in your spending more money later for the right-sized solution you will need in the future. (I’m not talking about incrementally more, but start-all-over-again more.)

My suggestion is to use your company’s existing business growth projections to calculate how big of a monitoring solution you need. If your company foresees 10% revenue growth each year over the next three years and then 5% each year after that, and you are willing to consider completely replacing your monitoring solution after five years, then buy a product that can scale to 40% of the size you currently need.

The dollar auction
The dollar auction mindset happens when there is already a tool in place — a tool that wasn’t cheap and that a lot of time was spent perfecting. The problem is, it’s no longer perfect. It needs to be replaced because company growth has expanded beyond its scalability, but the idea of walking away from everything invested in it is a hard pill to swallow.

Really, this isn’t so much of a mindset that prevents preparing for future growth as it is something that’s all too often overlooked as an important lesson: If only you had better planned for future growth the first time around. The reality is that if you’re experiencing this mindset, you need a new solution. However, don’t make the same mistake. This time, take scalability into account.

Whether you’re suffering from one of these mindsets or another that is preventing you from better preparing your IT monitoring for future growth, remember, scalability is key to long term success.

(This article originally appeared on NetworkComputing)

Time for a network monitoring application? What to look for

You might think that implementing a network monitoring tool is like every other rollout. You would be wrong.

Oh, so you’re installing a new network monitoring tool, huh? No surprise there, right? What, was it time for a rip-and-replace? Is your team finally moving away from monitoring in silos? Perhaps there were a few too many ‘Let me Google that for you’ moments with the old vendor’s support line?

Let’s face it. There are any number of reasons that could have led you to this point. What’s important is that you’re here. Now, you may think a new monitoring implementation is no different than any other rollout. There are some similarities, but there are also some critical elements that are very different. How you handle these can mean the difference between success and failure.

I’ve found there are three primary areas that are often overlooked when it comes to deploying a network monitoring application. This isn’t an exhaustive list, but taking your time with these three things will pay off in the end.

Scope–First, consider how far and how deep you need the monitoring to go. This will affect every other aspect of your rollout, so take your time thinking this through. When deciding how far, ask yourself the following questions:

  • Do I need to monitor all sites, or just the primary data center?
  • How about the development, test or quality assurance systems?
  • Do I need to monitor servers or just network devices?
  • If I do need to include servers,  should I cover every OS or just the main one(s)?
  • What about devices in DMZs?
  • What about small remote sites across low-speed connections?

And when considering how deep to go, ask these questions:

  • Do I need to also monitor up/down for non-routable interfaces (e.g., EtherChannel connections, multiprotocol label switching links, etc.)?
  • Do I need to monitor items that are normally down and alert when they’re up (e.g., cold standby servers, cellular wide area network links, etc.)?
  • Do I need to be concerned about virtual elements like host resource consumption by virtual machine, storage, security, log file aggregation and custom, home-grown applications?

Protocols and permissions–After you’ve decided which systems to monitor and what data to collect, you need to consider the methods to use. Protocols such as Simple Network Management Protocol (SNMP), Windows Management Instrumentation (WMI), syslog and NetFlow each have its own permissions and connection points in the environment.

For example, many organizations plan to use SNMP for hardware monitoring, only to discover it’s not enabled on dozens –or hundreds — of systems. Alternatively, they find out it is enabled, but the connection strings are inconsistent, undocumented or unset. Then they go to monitor in the DMZ and realize that the security policy won’t allow SNMP across the firewall.

Additionally, remember that different collection methods have different access schemes. For example, WMI uses a Windows account on the target machine. If it’s not there, has the wrong permissions or is locked, monitoring won’t work. Meanwhile, SNMP uses a simple string that can be different on each machine.

Architecture–Finally, consider the architecture of the tools you’re considering. This breaks down to connectivity and scalability.

First, let’s consider connectivity. Agent-based platforms have on-device agents that collect and store data locally, then forward large data sets at regular intervals. Each collector bundles and sends this data to a manger-of-managers, which passes it to the repository. Meanwhile, agentless solutions use a collector that directly polls source devices and forwards the information to the data store.

You need to understand the connectivity architecture of these various tools so you can effectively handle DMZs, remote sites, secondary data centers and the like. You also need to look at the connectivity limitations of various tools, such as how many devices each collector can support and how much data will be traversing the wire, so you can design a monitoring implementation that doesn’t cripple your network or collapse under its own weight.

Next comes scalability. Understand what kind of load the monitoring application will tolerate, and what your choices are to expand when — yes, when, not if — you hit that limit. To be honest, this is a tough one and many vendors hope you’ll accept some form of a, “it-really-depends” response.

In all fairness, it does matter, and some things are simply impossible to predict. For example, I once had a client who wanted to implement syslog monitoring on 4,000 devices. It ended up generating upwards of 20 million messages per hour. That was not a foreseeable outcome.

By taking these key elements of a monitoring tool implementation into consideration, you should be able to avoid most of the major missteps many monitoring rollouts suffer from. And the good news is that from there, the same techniques that serve you well during other implementations will help here.  You want to ask lots of questions; meet with customers in similar situations, such as environment size, business sector, etc.; set up a proof of concept first; engage experienced professionals to assist as necessary; and be prepared — both financially and psychologically — to adapt as wrinkles crop up. Because they will.

(this originally appeared on SearchNetworking)

ICYMI: IT monitoring: ignore it at your peril

This interview was originally posted on http://onlyit.ca

To many businesses, IT monitoring software is a luxury they cannot afford. However, that mindset is dangerous. Not monitoring your IT infrastructure can cost you in stolen data and damage your reputation. Leon Adato, who holds the title of “head geek” at SolarWinds, shared his thoughts on why IT monitoring software is vital to the health of companies as well as the consequences of ignoring the need to monitor your IT infrastructure.

“Over the course of my 25 years in IT, with 12 years specifically focused on monitoring, I would say that more often than not-say 60 percent of the time-businesses lack a gut understanding that monitoring helps save them money, and lots of it,” said Adato. “In addition, I’ve never seen a company, large or small, actually do the work to estimate and document the savings monitoring provides, either overall or on a per-alert basis.”

Adato recounted an anecdote from when he first started working in IT. “As an example, early in my career when I was doing desktop support, I got a call that the barcode printer on ‘production line seven’ was down,” he remarked. “When I got there, I realized the fix was going to take some time. It was the end of the day, I was tired and I wanted to get home. I figured this particular printer issue could wait until the next day. The guy working that line said to me, ‘I completely understand if you’ve got responsibilities, but let me make sure you understand the choice you are making-each one of these circuit boards is $10,000 of profit, and we don’t get the money until they ship, and they don’t ship until they get a barcode from that printer.’ I realized I was looking at 4 racks with about 150 boards per rack. I made a few calls and stayed late to get the printer back up and running.”

“The point of the story is that the guy on the line knew exactly what the cost breakdown was,” Adato continued. “He knew the material costs, labor costs, gross and net revenue, and he could have told you per minute, per hour, per production line how much money was being lost. That’s not uncommon in production environments. Unfortunately, companies usually don’t approach IT monitoring and alerting with the same attitude and level of awareness, even though they could, and in my opinion, certainly should.”

Even if businesses have some type of IT monitoring in place, it might not be across the entire business. “Monitoring is always happening, whether it’s a server tech who checks all his servers manually from time to time (‘monitoring via eyeballs’) or teams that implement their own ‘skunkworks’ systems,” Adato commented. “People in the trenches don’t like surprises. Those systems will be narrowly focused, though, and will probably overlap in terms of features as well as scope. For example, the server team and the Exchange team might both monitor the same server; possibly using two different tools that collect much of the same data.” This approach is inefficient and not cost effective.

Adato cited the benefits of a business-wide IT monitoring program. He noted that it provides “the ability to have ongoing metrics that allow for capacity planning, forensic analysis of unexpected events-there will always be black swans-and the shortening of not only mean time to repair but also mean time to innocence by using data to prove that something, such as the network, is not at fault so efforts can be focused elsewhere.”

SolarWinds’ head geek acknowledged that businesses will need to invest financial and personnel resources into IT monitoring. Furthermore, IT monitoring can shatter some illusions about infrastructure. “[There is] the potentially unhappy realization that the environment is not as stable as you thought it was,” Adato said. He sees a silver lining to that situation, though. “Of course, this is a good thing masquerading as a bad thing because knowing there’s a previously-undetectable problem is the first step to fixing it before it blows up,” Adato concluded.

The Top 5 Network Issues You Didn’t Know You Had

(and how monitoring can solve them)

I spend a lot of time talking about the value that monitoring can bring an organization, and helping IT professionals make a compelling case for expanding or creating a monitoring environment. One of the traps I fall into is talking about the functions and features that monitoring tools provide while believing that the problems they solve are self-evident.

While this is often not true when speaking to non-technical decision makers, it can come as a surprise that it’s sometimes not obvious even to a technical audience!

So I have found it helpful to describe the problem first, so that the listener understands and buys into the fact that a challenge exists. Once that’s done, talking about solutions becomes much easier.

With that in mind, here are the top 5 issues I see in companies today, along with ways that sophisticated monitoring addresses them.

Wireless Networks

Issue #1:

Ubiquitous wireless has directly influenced the decision to embrace BYOD programs, which has in turn created an explosion of devices on the network. It’s not uncommon for a single employee to have 3, 4, or even 5 devices.

 

This spike in device density has put an unanticipated strain on wireless networks. In addition to the sheer load, there are issues with the type of connections, mobility, and device proximity.

The need to know how many users are on each wireless AP, how much data they are pulling, and how devices move around the office has far outstripped the built-in options that come with the equipment.

Monitoring Can Help!

Wireless monitoring solutions tell you more than when an AP is down. They can alert you when an AP is over-subscribed, or when an individual device is consuming larger-than-expected amounts of data.

In addition, sophisticated monitoring tools now include wireless heat maps – which take the feedback from client devices and generate displays showing where signal strength is best (and worst) and the movement of devices in the environment.

Capacity Planning

Issue #2

We work hard to provision systems appropriately, and to keep tabs on how that system is performing under load. But this remains a largely manual process. Even with monitoring tools in place, capacity planning—knowing how far into the future a resource (CPU, RAM, disk, bandwidth) will last given current usage patterns—is something that humans do (often with a lot of guesswork). And all too often, resources still reach capacity without anyone noticing until it is far too late.

Monitoring Can Help!

This is a math problem, pure and simple. Sophisticated monitoring tools now have the logic built-in to consider both trending and usage patterns day-by-day and week-by-week in order to come up with a more accurate estimate of when a resource will run out. With this feature in place, alerts can be triggered so that staff can act proactively to do the higher-level analysis and act accordingly.

Packet Monitoring

Issue #3

We’ve gotten very good at monitoring the bits on a network – how many bits per second in and out; the number of errored bits; the number of discarded bits. But knowing how much is only half the story. Where those bits are going and how fast they are traveling is now just as crucial. User experience is now as important as network provisioning. As the saying goes: “Slow is the new down.” In addition, knowing where those packets are going is the first step to catching data breaches before they hit the front page of your favourite Internet news site.

Monitoring Can Help!

A new breed of monitoring tools includes the ability to read data as it crosses the wire and track source, destination, and timing. Thus you can get a listing of internal systems and who they are connecting to (and how much data is being transferred) as well as whether slowness is caused by network congestion or an anaemic application server.

Intelligent Alerts

Issue #4

“Slow is the new down”, but down is still down, too! The problem is that knowing something is down gets more complicated as systems evolve. Also, it would be nice to alert when a system is on its way down, so that the problem could be addressed before it impacts users.

Monitoring Can Help!

Monitoring tools have come a long way since the days of “ping failure” notifications. Alert logic can now take into account multiple elements simultaneously such as CPU, interface, and application metrics so that alerts are incredibly specific. Alert logic also now allows for de-duplication, delay based on time or number of occurrences, and more. Finally, the increased automation built into target systems allows monitoring tools to take action and then re-test at the next cycle to see if that automatic action fixed the situation.

Automatic Dependency Mapping

Issue #5

One device going down should not create 30 tickets. But it often does. This is because testing upstream/downstream devices requires knowing which devices those are, and how each depends on the other. This is either costly in terms of processing power, difficult given complex environments, time-consuming for staff to configure and maintain, or all three.

Monitoring Can Help!

Sophisticated monitoring tools now collect topology information using devices’ built-in commands, and then use that to build automatic dependency maps. These parent-child lists can be reviewed by staff and adjusted as needed, but they represent a huge leap ahead in terms of reducing “noise” alerts. And by reducing the noise, you increase the credibility of every remaining alert so that staff responds faster and with more trust in the system.

So, what are you waiting for?

At this point, the discussion doesn’t have to spiral around whether a particular feature is meaningful or not. As long as the audience agrees that they don’t want to find out what happens when everyone piles into conference room 4, phones, pads, and laptops in tow; or when the “free” movie streaming site starts pulling data out of your drive; or when the CEO finds out that the customer site crashed because a disk filled, but had been steadily filling up for weeks.

As long as everyone agrees that those are really problems, the discussion on features more or less runs itself.

(originally published on GeekSpeak)

eBooks For Your 2016 Reading List

As we tip over from the mad rush of December and prepare to ease into another year, I like to take a minute to appreciate the hush and calm that comes after the rush and bustle of various holidays.

This week after New Year I like to take a few moments to pause and regroup before diving into the new year. A chance to take stock, reflect, and think.

And so I’ve held off until now to officially promote the fruit of a few of my 2015 labors. If your resolutions for 2016 include making time to do some reading that doesn’t break your stretched-too-far-after-all-those-gifts budget, I want you to know that I’ve got a few eBook recommendations for the busy IT Pro. Each is available for Kindle (on Amazon) and also as a free PDF download.

Monitoring 101

Despite the relatively maturity of monitoring and systems management as a discrete IT discipline, I am asked – year after year and job after job – to give an overview of what monitoring is.

This book is my attempt to address that question in a more structured form, published with the assistance of the amazing folks at SolarWinds.

Intended as guide to help bring new team members (often fresh out of college or a technical program) up to speed with monitoring concepts quickly, this ebook (or portions of it) can serve as a good introduction for a variety of audiences.

Click here for the Kindle Edition | Click here for the PDF version

 

“Technically, These Are Some Random Thoughts”
Around September every year, Jews all over the world celebrate Rosh Hashana, the Jewish New Year. However, it’s not – to put it in business terms – a year-end review. It’s a job interview. the month before Rosh Hashana (called “Elul” in Hebrew) is the time for getting one’s balance sheet in order. To help with that, a bunch of folks from all walks of life participate in #BlogElul: A daily prompt provides the theme and people riff on that – sometimes a few hundred words, sometimes an image, sometimes a poem or just a single sentence. It’s something I’ve done for a few years now. I thought I’d add a twist and also do an I.T. Professional’s version of #BlogElul and post the essays on my technology-specific blog: http://www.adatosystems.com. A reflection on each of the daily prompts and what they mean in an I.T. context. You’re probably thinking “Leon, this is a Jewish thing and completely outside the scope of my experience or interest as an I.T. Professional.” To which I emphatically reply: Yes and no. If you have worked in I.T. for more than 15 minutes, you’ve likely been involved in a large development project, system roll-out, or upgrade. And as the date for the big cut-over approaches, there are usually daily status updates. Consider this the notes from my status updates before the roll-out of “TheWorld v.5776”.

Click here for the Kindle Edition | Click here for the Nook Edition | Click here for the PDF version

4 Skills to Master Your Virtual Universe

For some IT administrators, virtualization might not be a primary responsibility. Without the opportunity to learn and gain experience as part of their daily routine means these admins are getting a late start in the virtualization game. So why should IT admins, who don’t consider virtualization to be a critical part of their job description, care about virtualization? Because virtualization spans every data center construct from servers to storage to networking to security operations. Add in the fact that it is used in practically every IT shop and you have a perfect IT storm. So while you might have been hired to administer one of those systems, virtualization’s dependency and abstraction of those resources means you’ll need to bridge the
virtualization knowledge gap.

In this book, my fellow SolarWinds Head Geek Kong Yang describes the 4 key skills needed to gain mastery of your virtualized environment.

Click here for the Kindle Edition | Click here for the PDF version

 

I’m participating in an online discussion about Thwack – the online community and resource for SolarWinds professionals. And they asked for directions to my blog, so I wanted to get a big old welcome message.

Shout out to Jay, Crystal, JR, Bill, Steve/JR, Andrew, and our awesome meeting organizer Kelly!