Welcome!

Welcome to AdatoSystems, the “me, myself, and I” company for, well, me. Leon Adato. Head Geek at SolarWinds, 30+ year I.T. veteran, and all-around opinionated person. If you want to know more about what I can do for you or how to get in touch with me, check out the “About” page. Otherwise, enjoy the bloggy ramblings you’ll find below! And of course, feel free to connect with me on Twitter or LinkedIn.

ICYMI: The Five Stages of (Monitoring) Grief

(This article originally appeared on NetworkComputing.com)

If you’ve worked in IT for more than 10 minutes, you know that stuff goes wrong. In fact, it should be obvious that we have jobs in IT specifically because things go wrong.

And that’s what IT monitoring and automation is all about — building systems that automatically mind the shop, raise a flag when things start to go south and give the information needed to know what happened and when it happened so you can avoid it in the future.

After over a decade implementing monitoring systems at companies large and small, I’ve become all too familiar with what might be called monitoring grief. This is what often occurs when you are tasked to monitor something, anything for someone else — which is almost inevitable — and they ask you to do things you know are going to cause problems. It involves a series of behaviors I’ve grouped into five stages. Get it –the five stages of (IT monitoring) grief?

While companies often go through these stages when rolling out monitoring for the first time, they can also occur when a group or department starts to seriously implement an existing solution, when new capabilities are added to a current monitoring suite or simply when it’s Tuesday.

Spoiler alert: If you’re at all familiar with the standard Kubler-Ross five stages of grief model, acceptance is not on this list.

Stage one: monitor everything

This is the initial monitoring non-decision, a response to the simple and innocent question, “What do I need to monitor?” The favorite choice of managers and teams who won’t actually get the ticket is to simply open the fire hose wide and request you to monitor “everything.” This choice is also frequently made by admins with a hair-on-fire problem in progress. This decision assumes that all the information is good information, and can be “tuned up” later.

Stage two: The Prozac moment

This stage follows closely on the heels of the first, when the recipient of 734 monitoring alert emails in five minutes comes to you and exclaims, “All these things can’t possibly be going wrong!” While this may be correct in principle, it ignores the fact that a computer only defines “going wrong” as specifically as the humans who requested the monitors in the first place. So, you ratchet things down to reasonable levels, but “too much” is still showing red and the reaction remains the same.

Worse, because monitoring was considered “bad” before (due to the request for something stupid) monitoring must be wrong again. Except this time it isn’t wrong. It’s catching all the stuff that’s been going up and down for weeks, months, or years, but which nobody noticed. Either the failures self-corrected quickly enough, users never complained, or someone somewhere was jumping in and fixing it before anybody knew about it.

It’s at this moment you wish you could give the system owner Prozac so they will chill out and realize that knowing about outages is the first step to avoiding them in the future.

Stage three: Painting the roses green

The next stage occurs when too many things are still showing as “down” and no amount of tweaking is making them show “up” because, ahem, they are down.

In a fit of stubborn pride, the system owner often admits something like, “They’re not down-down, they’re just, you know, a little down-ish right now.” And so they demand that you do whatever it takes to show the systems as up/good/green.

And I mean they’ll ask you to do anything like changing alert thresholds to impossible levels (“Only alert if it’s been down for 30 hours. No, make that a full week.”) and disabling alerts entirely. In one case, I was told– under the threat of losing my job — to create  a completely false page with GIFs that have been re-colored from red to green to show senior management.

What makes this stage even more embarrassing for all concerned is that the work involved is often greater than the work to actually fix the issue.

Stage four: An inconvenient truth

And so goes the web of deceit and lies, sometimes for weeks or months, until the point when there’s a critical error that can’t be glossed (or Photoshopped) over. At that point, you and the system owner find yourselves on a service restoration team phone call with about a dozen other engineers and a few high-ranking IT staffers where everything is analyzed, checked and restarted in real-time.

This is about the time someone asks to see the performance data for the system — the one that’s been down for a month and a half, but showed as “up” on the reports. For a system owner who has developed a habit of buying their green paint by the tanker-full, there is nowhere left to run or hide.

Stage five: Gaming the system

Assuming the system owner has managed through stage four with his or her job intact, stage five involves keeping you at arm’s length. The less sophisticated folks will find ways to have monitoring without all the messy inconvenience of actually having monitoring while people who’ve been around a while will request detailed information on exactly what permissions you need in order to monitor. That information is passed along to an inevitably brand-new security audit team that denies the access request out-of-hand because the permissions are too risky to give out.

At this point, you have a choice: Pull out all your documentation and insist you be given the permissions that have already been agreed upon, or go find another group that actually wants monitoring.

And what of the system owners who started off by demanding, “monitor everything?” Don’t worry, they’ll be back after the next system outage — to give you more grief.

Fat Tuesday: Legacy of a Mis-Spent Youth

As I’m only a week into this new health style choice (my coach Jeff still won’t let me call it a diet) there’s not a lot of change to report. But any good upgrade has to at least acknowledge the choices and conditions that led to the business case for the project, so here goes.

When I was younger I had, in the word of one dietition, “the metabolism of a hyperactive ferret on speed”. At my heaviest, I was about 130lbs but I said I was 150 (which would only have been true if I was soaking wet, wearing army boots, with rocks in my pockets). In college when I lived in the dorms and meal plans meant 3 all-you-can-eat sessions a day, I would still sometimes wake up famished at 2am. My solution was to walk up the street to McDonalds, eat a big mac, and go back to bed. (I went to school in NYC and they don’t call it the city that doesn’t sleep for nothing.)

What this meant that not only did I NOT learn good eating habits, I actively cultivated POOR eating habits. Carbs were my best friend, as was anything chocolate covered. Eating after 10pm was encouraged. And so on.

It turns out age was the kryptonite to my powers of super digestion. I hit 30 and everything slowed down. Of course, it wasn’t quite overnight, but in the course of about 2 years I was developing a paunch that was cute at first, but didn’t go away. At the same time, my career in IT (and the business itself) had transitioned from a largely active one running around a classroom all day or going cube to cube and floor to floor doing desktop support; to a desk based one (and even work-from-home. Now I didn’t even have the walk across the parking lot to keeping me moving).

By 40 the extra pounds were visibly adding up, and I added acid reflux to the mix. At this point, I realized I needed to take my eating habits seriously. Weight Watchers was already a thing in my house, so it was easy to fall in line. SOME of the pounds fell off, and that was good enough.

But, as 50 approached, the weight started creeping up again DESPITE the fact that I’d developed some relatively healthy habits. I was exercising fairly regularly, I was eating reasonably, but the pounds seemed to stick around no matter what I did.

And that’s what brought me to this point. There’s more to the story, but that’s a tale for another day.

And now to run the numbers. Here in week 1, I’m clocking in at:

  • 5′ 8″ tall (YAY, I’m not shrinking!)
  • 51 yrs old (it beats the alternative)
  • 180 lbs
  • 41″ belly
  • 39″ waist

 

 

ICYMI: What Defines You?

(This originally appeared on THWACK.com)

A few months back, SearchNetworking editor, Chuck Moozakis interviewed me for an article discussing the future of network engineers in the IT landscape: “Will IT Generalists Replace Network Engineering Jobs?”As part of our discussion, he asked me, “what in your mind, defined you as a networking pro in 1995, in 2005, and in 2015?” My initial answers are below, but his question got me thinking.

How we identify ourselves is a complex interaction of our beliefs, perceptions, and experiences. Just to be clear: I’m not qualified to delve into the shadowy corners of the human psyche as it relates to the big questions of who we are.

But in a much more limited scope, how we identify within the scope of IT professionals is an idea I find fascinating and ripe for discussion.

Every branch of IT has a set of skills specific to it, but being able to execute those skills doesn’t necessarily define you as “one of them.” I can write a SQL query, but that doesn’t make me a DBA. I can hack together a Perl script, but I am by no stretch of the imagination a programmer.

Adding to the confusion is that the “definitive” skills, those tasks which DO cause me to identify as a member of a particular specialty, change over time.

So that’s my question for you. What “are” you in the world of IT? Are you a master DBA, a DevOps ninja, a network guru? Besides your affinity to that area—your love of all things SQL or your belief that Linux is better than any other OS—what are the things you DO which in your mind “make” you part of that group? Tell me about it in the comments below.

For the record, here is how I answered Chuck’s original question:

What made you identify as a networking professional in the year?”

1995

I was a networking professional because I understood the physical layer. I knew that token ring needed a terminator, and how far a single line could run before attenuation won out. I knew about MAU’s and star topology. I could configure a variety of NIC’s on a variety of operating systems. I could even crimp my own CAT3 and CAT5 cables in straight-through or crossover configurations (and I knew when and why you needed each). While there were certainly middle pieces of the network to know about—switches, routers, and bridges—the mental distance between the user screen and the server (because in those days the server WAS the application) was very short. Even to the nascent internet, everything was hard-coded. In environments that made the leap to TCP/IP (often in combination with NetWare, SmallTalk, and NetBIOS) all PC’s had public-facing IP addresses. NAT hadn’t been implemented yet.

2005

You could almost look at the early-to-mid 2000’s as the golden age of the network professional. In addition to enjoying a VERY robust employment market, networking technologies were mature, sophisticated, complex, and varied. The CCNA exam still included questions on FDDI, Frame Relay, fractional T’s, and even a NetBIOS or SmallTalk question here or there (mostly how it mapped to the OSI model). But MPLS and dark fiber was happening, wireless (in the form of 802.11b with WEP) was on the rise, VoIP was stabilizing and coming down in cost to the point where businesses were seriously considering replacing all of their equipment, and the InfoSec professionals were being born in the form of ACL jockeys and people who knew how to do “penetrative testing” (i.e.: white-hack hacking). How did I fit in? By 2005 I was already focused on the specialization of monitoring (and had been for about 6 years), but I was a networking professional because I knew and understood at least SOME of what I just named, and could help organizations monitor it so they could start to pull back the veil on all that complexity.

2015

Today’s networking professional stands on the cusp of a sea-change. SDN, IoT, BYOD, cloud and hybrid cloud (and their associated security needs) all stand to impact the scale of networks and the volume of data they transmit in ways unimaginable just 5 years ago. If you ask me why I consider myself a networking professional today, it’s not because I have network commands memorized or because I can rack and stack a core switch in under 20 minutes. It’s because I understand all of that, but I’m mentally ready for what comes next.

ICYMI: Preparing For the Big One (or not)

(This originally appeared on Data Center Journal)

During last year’s  [ed: 2014] World Cup soccer competition, Nate Silver and the psychic witches he keeps in his basement — because how else could he make the predictions he does with such accuracy? — got it wrong. Really, really wrong. They were completely blindsided by Germany’s win over Brazil. As Silver described it, it was a completely unforeseeable event.

In sports and, to a lesser extent, politics, the tendency in the face of these things is to eat the loss, chalk it up to a fluke — a black swan in statistics parlance — and get on with life.

But as network administrators, we know that’s not how it works in IT.

In my experience, when a black swan event affects IT systems, management usually acquires a dark obsession with the event. Meetings are called under the guise of “lessons learned exercises,” with the express intent of ensuring said system outages never happen again.

Don’t spend too much time studying what might occur

Now, I’m not saying that after a failure we should just blithely ignore any lessons that could be learned. Far from it, actually. In the ashes of a failure, you often find the seeds of future avoidance. One of the first things an IT organization should do after such an event is determine whether the failure was predictable, or if it was one of those cases where there wasn’t enough historical data to determine a decent probability.

If the latter is the case, I’m here to tell you your efforts are much better spent elsewhere. What’s a better approach? Instead of spending time trying to figure out if a probability may or may not exist, catch and circumvent those common, everyday IT annoyances. This is a tactic that’s overlooked far too often.

Don’t believe me? Well, let’s take the example of a not-so-imaginary company I know that had a single, spectacular IT failure that cost somewhere in the neighborhood of $100,000. Management was understandably upset. It immediately set up a task force to identify the root cause of the failure and recommend steps to avoid it in the future. Sounds reasonable, right?

The task force — five experts pulled from the server, network, storage, database and applications teams — took three months and more than 100 staff-hours to investigate the root cause. Being conservative, let’s say the hourly cost to the company was $50. Now, multiply that by five people, then by 100 hours, then by three months. It comes to a nice round $125,000.

Not so reasonable after all

Yes, at the end of it all the root problem was not only identified — at least, as much as possible — but code was put in place to (probably) predict the next time the exact same event might occur. Doesn’t sound so bad. But keep this in mind: The company spent $25,000 more than the cost of the original failure to create a system outages solution that may or may not predict the occurrence of a black swan exactly like the one that hit before.

Maybe it wasn’t so reasonable after all.

You may be thinking, “But where else are you saying we should be focusing on? After all, we’re held accountable to the bottom line as much as anyone else in the company.”

I get that, and it’s actually my point. Let’s compare the previous example of chasing a black swan to another, far more common problem: network interface card (NIC) failures.

In this example, another not-so-fictitious company saw bandwidth usage spike and stay high. NICs threw errors until the transmission rates bottomed out, and eventually the card just up and died. The problem was that while bandwidth usage was monitored, there was no alerting in place for interfaces that stopped responding or disappeared (the company monitored the IP at the end of the connection, which meant WAN links were absent alerts until the far end went down).

Let’s assume that a NIC failure takes an average of one hour to notice and correctly diagnose, and then it takes two hours to fix by network administrators who cost the company $53 per hour. While the circuit is out, the company loses about $1,000 per hour in revenue, lost opportunity, etc. That means system outages like this one could cost the company $3,106.

Setting a framework anchored by alerting and monitoring

Now, consider that, in my experience, proper monitoring and alerting reduces the time it takes to notice and diagnose problems such as NIC failures to 15 minutes. That’s it. Nothing else fancy, at least not in this scenario. But that simple thing could reduce the cost of the outage by $750.

I know those numbers don’t sound too impressive. That is, until you realize a moderately sized company can easily experience 100 NIC failures per year. That translates to more than $300,000 in lost revenue if the problem is unmonitored, and an annual savings of $75,000 if alerting is in place.

And that doesn’t take into account the ability to predict NIC failures and replace the card pre-emptively. If we estimate that 50% of the failures could be avoided using predictive monitoring, the savings could rise to more than $190,000.

Again, I’m not saying preparing for black swan events isn’t a worthy endeavor, but when tough budget decisions need to be made, some simple alerting on common problems can save more than trying to predict and prevent “the big one” that may or may not ever happen.

After all, NIC failures are no black swan. I think even Nate Silver would agree they’re a sure thing.

Fat Tuesday: GOTO 0

As a true IT pro and geek at heart, I like data. As a true middle aged white dude working in a job that keeps me sedentary for most of the day, I struggle with my weight. And as someone who’s work puts them in front of large crowds in both live settings and on videos, I’m more than a little self-conscious about how I look.

Which brings me to the current state. For the last couple of years, I’ve lost and regained the same 5 pounds every week or so, never being able to crack that lower boundary. So, like an application programmer who is looking obtain the best database performance possible, I’ve engaged the help of a pro. In this case it’s not a DBA, but a coach to help me focus on healthy habits and lifestyles. His name is Jeff. He’s pretty cool, and you can find his info here.

He also coached me not to say “weight loss” which is probably some “eye of the tiger” mumbo jumbo so I don’t focus on how much baklavah I won’t be eating for a little while.  You can see it’s working.

While coaching is a piece of the puzzle, we all know that you can’t determine if the situation has improved unless you start monitoring it. That means collecting baselines prior to beginning the project, regular monitoring during, and reporting so that you know when you’ve achieved optimal performance.

So that’s what this ongoing Tuesday column will be. I’m going to try to give the unvarnished data along with anecdotal information about how or why we’re seeing whatever trends come to light.

So here goes. Here in week 0 :

  • 5′ 8″ tall (I don’t expect that to change)
  • 51 yrs old (I predict very slow progress on this number)
  • 188.5 lbs
  • 44″ belly
  • 43″ waist

I’ll have more details, both about the program, my perceptions of it, and my health history, as things progress. I hope you’ll stick with me on this adventure!

ICYMI: What Makes Us Go To Extremes?

(This originally appeared on THWACK)

I’ve really enjoyed watching (and re-watching, a few times) this video SolarWinds made in honor of SysAdmin day (SysAdmin Day Extreme). What appeals to me – besides the goofy sound effects and scenarios (Seriously, guys? “Parkour Password Reset”?!?) is the underlying premise – that sysadmins are adrenaline junkies and our job is a constant series of RedBull-fueled obstacles we must overcome. Because even though that doesn’t match the reality in our cubicle, it is often the subtext we have running through our head.

In our heads, we’re Scotty (or Geordi) rerouting power. We’re Tony Stark upgrading that first kludged-together design into a work of art. We’re MacGuyver. And yeah, we’re also David Lightman in “War Games”, messing around with new tech and unwittingly getting ourselves completely over our head.

As IT Professionals, we know we’re a weird breed. We laugh at what Wes Borg said back in 2006, but we also know it is true: “…there is nothing that beats the adrenaline buzz of configuring some idiot’s ADSL modem even though he’s running Windows 3.1 on a 386 with 4 megs of RAM, man!”.

And then we go back to the mind-numbing drudgery of Windows patches and password resets.

I’ve often said that my job as a sysadmin is comprised of long stretches of soul-crushing frustration, punctuated by brief moments of heart-stopping panic, which are often followed by manic euphoria.

In those moments of euphoria we realize that we’re a true superhero, that we have (as we were told on Saturday mornings) “powers and abilities far beyond those of mortal man.”

That euphoria is what makes us think that 24 hour datacenter cutovers are a good idea; that carrying the on-call pager for a week is perfectly rational; that giving the CEO our home number so he can call us with some home computer problems is anything resembling a wise choice.

So, while most of us won’t empty an entire k-cup into our face and call it “Caffeine Overclocking”, I appreciate the way it illustrates a sysadmin’s desire to go to extremes.

I also like that we’ve set up a specific page for SysAdminDay and that along with the video, we’ve posted links to all of our free (free as in beer, not just 30 day demo or trial) software, and some Greeting Cards with that special sysadmin in your life in mind.

Oh, and I’ve totally crawled through a rat’s nest of cables like that to plug something in.

What are some of your “extreme SysAdmin” stories?

ICYMI: Respect Your Elders

(This article originally appeared on THWACK’s GeekSpeak forum)

“Oh Geez,” exclaimed the guy who sits 2 desks from me, “that thing is ancient! Why would they give him that?”

Taking the bait, I popped my head over the wall and asked “what is?”

He showed me a textOlder rev’s of industrial motion-control systems used specific pin-outs on the serial port. The new USB-to-Serial cables don’t mimic those pin-outs correctly, and trying to upload a program with the new cables will render the entire system useless. message, sent to him from a buddy—an engineer (EE, actually) who worked for an oil company. My co-worker’s iPhone 6 displayed an image of a laptop we could only describe as “vintage”:

(A Toshiba Tecra 510CDT, which was cutting edge…back in 1997.)

“Wow.” I said. “Those were amazing. I worked on a ton of those. They were serious workhorses—you could practically drop one from a 4 story building and it would still work. I wanted one like nobody’s business, but I could never afford it.”

“OK, back in the day I’m sure they were great,” said my 20-something coworker dismissively. “But what the hell is he going to do with it NOW? Can it even run an OS anymore?”

I realized he was coming from a particular frame of reference that is common to all of us in I.T. Newer is better. Period. With few exceptions (COUGH-Windows M.E.-COUGH), the latest version of something—be it hardware or software—is always a step up from what came before.

While true, it leads to a frame of mind that is patently un-true: a belief that what is old is also irrelevant. Especially for I.T. Professionals, it’s a dangerous line of thought that almost always leads to un-necessary mistakes and avoidable failures.

In fact, ask any I.T. pro who’s been at it for a decade, and you’ll hear story after story:

  • When programmers used COBOL, back when dinosaurs roamed the earth, one of the fundamental techniques that were drilled into their heads was, “check your inputs.” Thinking about the latest version of exploits, be they an SSLv3 thing like ‘Poodle’, or a SQL injection, or any of a plethora of web based security problems, the fundamental flaw is the server NOT checking its inputs, for sanity.
  • How about the OSI model? Yes, we all know its required knowledge for many certification exams (and at least one IT joke). But more importantly, it was (and still is) directly relevant to basic network troubleshooting.
  • Nobody needs to know CORBA database structure anymore, right? Except that a major monitoring tool was originally developed on CORBA and that foundation has stuck. Which is why, if you try to create a folder-inside-a-folder more than 3 times, the entire system corrupts. CORBA (one of the original object-oriented databases) could only handle 3 levels of object containership.
  • Powershell can be learned without understanding the Unix/Linux command line concepts. But, it’s sure EASIER to learn if you already know how to pipe ls into grep into awk into awk so that you get a list of just the files you want, sorted by date. That technique (among other Unix/Linux concepts) was one of the original goals of Powershell.
  • Older rev’s of industrial motion-control systems used specific pin-outs on the serial port. The new USB-to-Serial cables don’t mimic those pin-outs correctly, and trying to upload a program with the new cables will render the entire system useless.

And in fact, that’s why my co-worker’s buddy was handed one of those venerable Tecra laptops. It had a standard serial port and it was preloaded with the vendor’s DOS-based ladder-logic programming utility. Nobody expected it to run Windows 10, but it fulfilled a role that modern hardware simply couldn’t have done.

It’s an interesting story, but you have to ask: aside from some interesting anecdotes and a few bizarre use-cases, does this have any relevance to our work day-today?

You bet.

We live in a world where servers, storage, and now the network is rushing toward a quantum singularity of virtualization.

And the “old-timers” in the mainframe team are laughing their butts off as they watch us run in circles, inventing new words to describe techniques they learned at the beginning of their career; making mistakes they solved decades ago; and (worst of all) dismissing everything they know as utterly irrelevant.

Think I’m exaggerating? SAN and NAS look suspiciously like DASD, just on faster hardware. Services like Azure and AWS, for all their glitz and automation, aren’t as far from rented time on a mainframe as we’d like to imagine. And when my company replaces my laptop with a fancy “appliance” that connects to Citrix VDI session, it reminds me of nothing as much as the VAX terminals I supported back in the day.

My point isn’t that I’m a techno-Ecclesiastes shouting “there is nothing new under the sun!” Or some I.T. hipster who was into the cloud before it was cool. My point is that it behooves us to remember that everything we do, and every technology we use, had its origins in something much older than 20 minutes ago.

If we take the time to understand that foundational technology we have the chance to avoid past mis-steps, leverage undocumented efficiencies built into the core of the tools, and build on ideas elegant enough to have withstood the test of time.

SCP Exam Overview – My Perspective

(NOTE: This is an OLD post from THWACK, which I’m re-posting here for posterity. Much about the SCP program has changed since writing this, and you can expect an updated post soon.)

I like to take tests. I’m just weird that way. At one of my jobs, they actually put “exam hamster” on my business cards. For me, It’s like doing a crossword puzzle and most of the time I don’t have a lot riding on the exam. Plus, with IT certification tests, I know I can usually retake them if I bomb horribly. So, at the very worst, taking a test and doing badly is just a way of finding out EXACTLY what kinds of questions that test is going to ask.

I recently took the SCP exam “cold”. Meaning I looked over the sample questions, watched a couple of the online videos, and then said “what the hell” and dove in.

Now “cold” for me means: I have used Solarwinds (on and off) since 2004, I passed my CCNA (also in 2004, it has since lapsed) and I’ve been working with monitoring tools (BMC Patrol, Tivoli, OpenView, Nagios, Zenoss, etc) for the last 11 years. But the point is, I didn’t intensively study SCP prep material so I had the “right” answer.

The rest of the guys on my team want to take the test so I wrote up an overview of the exam for them, which appears below. I thought I would share it here in case:

  1. you don’t share my love of tests
  2. you aren’t sure if you are ready
  3. you don’t want to waste your money/time/hair/stomach lining by feeling unprepared.

(NOTE: I checked with the SolarWinds Exam Overlords, to make sure I’m not giving too much away here. Just in case you are worried about that kind of thing. I was.)

Test Mechanics OverView

  • The test is online. You don’t go into a testing center. You can take it from work, home, the coffee shop, your secret lair in your parent’s basement, etc.
  • The test is made up of 77 randomly selected multiple choice questions.
  • The test is not adaptive. You will answer all 77 questions.
  • The exam is “one way” – no “back”, no “skip”, no “pause” and no “review my answers”
  • Most questions have 1 answer.
  • A few have multi-answer (but it will tell you how many – ie: “Pick the best 2 from the choices below”).
  • Partial answers are marked as wrong.
  • Blank answers are marked as wrong
  • If you accidentally leave a blank or partial answer, you’ll get a warning prompt. But if you confirm, it’s done.
  • You have 90 minutes to complete the test
    • DON’T PANIC! That’s a little over 1 minute per question. PLENTY of time. Seriously.
    • No seriously. Make a fist and squeeze it as hard as you can for 60 seconds. That’s how long you have to think about and answer EACH question.
  • 70% is passing.
  • Every question is worth the same (ie: questions are not weighted)
    • That means you need 54 correct answers to pass.
    • Or to put it another way, you can get 23 questions wrong and still pass.
  • You have 3 attempts to pass the exam
  • You must wait 14 days between attempts

Am I Ready?

There are a couple of ways I think you can confirm you are ready:

  1. You go through the sample tests and you not only get the answers right, you understand:
    1. The broader topic they are discussing (netflow, router configuration, firewalls, troubleshooting)
    2. WHY the right answer is the right one (versus the others)
    3. HOW the other answers are wrong for this situation
    4. WHERE those other answers would be the correct answer
  2. When you read one of the sample test questions, you know what screen/utility they are talking about and you can get to that same screen and use it (maybe not for what THEY are asking, but you know how to get around in it).

General testing ideas (they work for any test)

  • TAKE YOUR TIME
  • Don’t give up.
    • I watched a guy bail on his CCNA exam with 10 questions left. When the proctor ran the score, he missed passing by 5 points.
  • If you are really stumped, start over by looking at the answers first, and then seeing which one(s) seem to fit the question.
  • Remember, this is a SolarWinds exam. The right answer is always from the SW perspective (ie: if you have a choice between “do it with a DOS batch file” and “Do it with a SolarWinds SAM script” and you aren’t sure, SolarWinds is your better bet.
  • If you don’t know the answer and one of the answers is significantly longer than the rest, that’s a good bet as well.
  • If you really don’t know, eliminate the stupid answers (there’s usually at least one) and then guess.

Good Ideas for the SCP

  • This is “open book” – have a separate browser window with tabs open to google, thwack, and the NPM admin guide as well as browser AND RDP open to the polling engine (assuming you have one handy) so you can check things out before hitting “submit”
  • Also have a calculator open
  • Also have a subnet calculator open

Specific thoughts on each of the sections

** Indicates thoughts I  added after I took the test

Network Management Fundamentals

  • Know the OSI model (come on dude: All People Seem To Need Data Processing) and how SolarWinds stuff (SNMP, Netflow, SSH, etc) maps to it.
  • Ping is ICMP (no port)
  • SNMP poll is UDP port 161, trap is 162
  • Syslog is port 514
  • NetFlow is UDP 2055
  • Netflow is push-based. When the router sees a conversation is done, it pushes the information to the configured Netflow receiver(s)
  • WMI requires 1) the service to be enabled on the target server, 2) all ports over 1024, 3) a windows user account (domain or local)
  • Know the terms MIB, OID, Perfmon Counter
  • ** Know the very basic basics of subnetting (ie: that 10.10.12.1 and 10.10.15.1 are both contained in the subnet mask 255.255.240.0)
  • ** Know the IOS string to configure a router for SNMP (traps and poll)
  • ** Know the basic concept of how to configure an ACL on a router
  • ** Know what NetFlow is, how it works, etc.

Network Management Planning

  • Protocol “load” on the network from least to most: ICMP, SNMP, WMI, NetFlow
  • Document goals, take a baseline
  • Know how to build a report
  • When do you need a distributed solution?
    • Shakey connections back to Poller
    • Redundancy
    • ACL issues
  • Understand SolarWinds licensing model

Network Management Operations

  • Know the SNMP versions (1, 2vc, 3) and what each added to the standard
  • Know Netflow versions (5, 9 and 9 with ipfix) and the basic features of each
  • Know the levels of Syslog logging in order
  • Know what SNMP provides versus Netflow

Network Fault & Performance Troubleshooting

  • Know the OSI model, and where (what layer) you would find: telnet, ping, ssh, ACL’s, snmp, syslog and netflow
  • ** Know the format of an OSPF routing syslog alert, when routing is failing
  • ** Think through the Saas CRM example and all the different ways it could fail, and how you would determine it. (ping stops on their network…etc)
  • ** Understand the various levels of counters (node details pages) in the virtualization “stack” (VCenter, Cluster, Datacenter, Host, guests)
  • ** What can SolarWinds tell you about Virtual environments that tell you to change that virtual environment (ie: How do you know when it’s time to add more hosts)
  • Understand general VMWare concepts: virtualcenter, cluster, datacenter, host, guest; what happens when you go p2v, etc.

SolarWinds NMP Administration

  • Obviously, this is the biggest section
  • Understand escalation triggers
  • Know the basics of the SolarWinds engineers toolset and how to integrate it with NPM
  • ** Understand HOW reportwriter works (how to create, clone, import, export – including to/from thwack)
  • ** Understand report design – timeframes, groupings, summarization
  • ** Understand WHAT report scheduler does (but not necessarily HOW to configure it)
  • ** Understand account settings – especially limitations and how they work
  • ** Understand network discovery, including the one we never use (seed file)
  • ** Know how to create a trap alert versus a regular alert (ditto for syslog)
  • ** Know the indicators that tell you the SolarWinds installation (database, etc) is over capacity

#FeatureFriday: Improving Alerts with Query Execution Plans

Welcome to “Feature Friday”, a series of (typically short) videos which explain a feature, function, or technique.

Alerts are, for many monitoring engineers, the bread-and-butter of their job. What many fail to recognize is that, regardless of how graphical and “Natural English Language” the alert builder appears, what you are really creating is a query. Often it is a query which runs frequently (every minute or even more) against the entire database.

Because of that, a single, poorly-constructed query can have a huge (and hugely negative) impact on overall performance. Get a few bad eggs, and the rest of the monitoring system – polling, display, reports, etc – can grind to a crawl, or even come to a halt.

Luckily there’s a tool which can help you discover a query’s execution performance, and identify where the major bottlenecks are.

In the video below my fellow Solarwinds Head Geeks and I channel our inner SQLRockstar and dive into query execution plans and how to apply that technique to SolarWinds Orion alerts. 


For more insights into monitoring, as well as random silliness, you can follow me on Twitter (@LeonAdato) or find me on the SolarWinds THWACK.com forums (@adatole)

Blueprint: The Evolution of the Network, Part 2

NOTE: This article originally appeared here.

If you’re not prepared for the future of networking, you’re already behind.

That may sound harsh, but it’s true. Given the speed at which technology evolves compared to the rate most of us typically evolve in terms of our skillsets, there’s no time to waste in preparing ourselves to manage and monitor the networks of tomorrow. Yes, this is a bit of a daunting proposition considering the fact that some of us are still trying to catch up with today’s essentials of network monitoring and management, but the reality is that they’re not really mutually exclusive, are they?

In part of one this series, I outlined how the networks of today have evolved from those of yesteryear, and what today’s new essentials of network monitoring and management are as a consequence. By paying careful attention, you will likely have picked up on ways the lessons from the past that I described helped shape those new essentials.

Similarly, today’s essentials will help shape those of tomorrow. Thus, as I said, getting better at leveraging today’s essentials of network monitoring and managing is not mutually exclusive from preparing for the networks of tomorrow.

Before delving into what the next generation of network monitoring and management will look like, it’s important to first explore what the next generation of networking will look like.

On the Horizon

Above all else, one thing is for certain: We networking professionals should expect tomorrow’s technology to create more complex networks resulting in even more complex problems to solve. With that in mind, here are the top networking trends that are likely to shape the networks of the future:

Networks growing in all directions
Fitbits, tablets, phablets and applications galore. The explosion of IoT, BYOD, BYOA and BYO-everything else is upon us. With this trend still in its infancy, the future of connected devices and applications will be not only about the quantity of connected devices, but also the quality of their connections tunneling network bandwidth.

But it goes beyond the gadgets end users bring into the environment. More and more, commodity devices such as HVAC infrastructure, environmental systems such as lighting, security devices and more all use bandwidth—cellular or WiFi—to communicate outbound and receive updates and instructions inbound. Companies are using, or planning to use, IoT devices to track product, employees and equipment. This explosion of devices that consume or produce data will, not might, create a potentially disruptive explosion in bandwidth consumption, security concerns and monitoring and management requirements.

IPv6 eventually takes the stage…or sooner (as in now!)
Recently, ARIN was unable to fulfill a request for IPv4 addresses because the request was greater than the contiguous blocks available. Meanwhile, IPv6 is now almost always enabled by default and is therefore creating challenges for IT professionals even if they, and their organizations, have committed to putting off their own IPv6 decisions. The upshot of all this is that IPv6 is a reality today. There is an inevitable and quickly approaching moment when switching over will no longer be an option, but a requirement.

SDN and NFV will become the mainstream
Software defined networking (SDN) and network function virtualization (NFV) are just in their infancy and should be expected to become mainstream in the next five to seven years. With SDN and virtualization creating new opportunities for hybrid infrastructure, a serious look at adoption of these technologies is becoming more and more important.

So long WAN Optimization, Hello ISPs
There are a number of reasons WAN technology is and will be kicked to the curb in greater fervency. With bandwidth increases outpacing CPU and custom hardware’s ability to perform deep inspection and optimization, and with ISPs helping to circumvent the cost and complexities associated with WAN accelerators, WAN optimization will only see the light of tomorrow in unique use cases where the rewards outweigh the risks. As most of us will admit, WAN accelerators are expensive and complicated, making ISPs more and more attractive. Their future living inside our networks is certainly bright.

Farewell L4 Firewalling 
With the mass of applications and services moving towards web-based deployment, using Layer 4 (L4) firewalls to block these services entirely will not be tolerated. A firewall incapable of performing deep packet analysis and understanding the nature of the traffic at the Layer 7 (L7), or the application layer, will not satisfy the level of granularity and flexibility that most network administrators should offer their users. On this front, change is clearly inevitable for us network professional, whether it means added network complexity and adapting to new infrastructures or simply letting withering technologies go.

Preparing to Manage the Networks of Tomorrow  

So, what can we do to prepare to monitor and manage the networks of tomorrow? Consider the following:

Understand the “who, what, why and where” of IoT, BYOD and BYOA
Connected devices cannot be ignored. According to 451 Research, mobile Internet of Things (IoT) and Machine-to-Machine (M2M) connections will increase to 908 million in just five years, this compared to 252 million just last year. This staggering statistic should prompt you to start creating a plan of action on how you will manage nearly four times the number of devices infiltrating your networks today.

Your strategy can either aim to manage these devices within the network or set an organizational policy to regulate traffic altogether. Nonprofit IT trade association CompTIA noted in a recent survey, many companies are trying to implement partial and even zero BYOD policies to regulate security and bandwidth issues. Even though policies may seem like an easy fix, curbing all of tomorrow’s BYOD/BYOA is nearly impossible. As such, you will have to understand your network device traffic in incremental metrics in order to optimize and secure them. Even more so, you will need to understand network segments that aren’t even in your direct control, like the tablets, phablets and Fitbits, to properly isolate issues.

Know the ins and outs of the new mainstream 
As stated earlier, SDN, NFV and IPv6 will become the new mainstream. We can start preparing for these technologies’ future takeovers by taking a hybrid approach to our infrastructures today. This will put us ahead of the game with an understanding of how these technologies work, the new complexities they create and how they will ultimately affect configuration management and troubleshooting ahead of mainstream deployment.

Start comparison shopping now
Going through the exercise of evaluating ISPs, virtualized network options and other on-the-horizon technologies—even if you don’t intend to switch right now—will help you nail down your particular requirements. Sometimes, knowing a vendor has or works with technology you don’t need right now, such as IPv6, but might later can and should influence on your decision.

Brick in, brick out
Taking on new technologies can feel overwhelming to those of us with “boots on the ground” because the new technology can often simply seem like one more mouth to feed, so to speak. As much as possible, look for ways that potential new additions will not just enhance, but replace the old guard. Maybe your new real-time deep packet inspection won’t completely replace L4 firewalls, but if it can reduce them significantly—while at the same time increasing insight and the ability to respond intelligently to issues—then the net result should be a better day for you. If you don’t do this, then more times than not, new technology will indeed simply seem to increase workload and do little else. This is also a great measuring stick to identify new technologies whose time may not yet have truly come just yet, at least not for your organization.

At a more basic layer, if you have to replace three broken devices and you realize that the newer equipment is far more manageable or has more useful features, consider replacing the entire fleet of old technology even if it hasn’t fallen apart yet. The benefits of consistency often far outweigh the initial pain of sticker shock.

To conclude this series, my opening statement from part one merits repeating: learn from the past, live in the present and prepare for the future. The evolution of networking waits for no one. Don’t be left behind.