Using NetFlow monitoring tactics to solve bandwidth mysteries

NetFlow eliminated the hassle of network troubleshooting after a school complained about its Internet access.

Life in IT can be pretty entertaining. And apparently we admins aren’t the only ones who think so — Hollywood’s taken notice, too. The problem is, the television shows and feature films about IT are all comedies. I’m not saying we don’t experience some pretty humorous stuff, but I want a real show; you know, something with substance. I can see it now — The Xmit (and RCV) Files: The Truth is Out There.

 In fact, I’ve already got the pilot episode all planned. It’s based on an experience I had not long ago with the NetFlow monitoring protocol.

The company I was with at the time offered monitoring as a service to a variety of clients. One day, I was holding the receiver away from my head as a school principal shouted, “The Internet keeps going down! You’ve got to do something.”

Now, there are few phrases that get my goat like “the Internet is down,” or its more common cousin, “the network is down.” So, my first thought was, “Buddy, we have redundant circuits, switches configured for failover and comprehensive monitoring. The network is not down, so please shut up.”

Of course, that’s not what I said. Instead, I asked a few questions that would help me narrow down the root cause of the problem.

First up: “How often are you experiencing this issue?”

“A bunch,” I was told.

“Ooookay … at any particular time?” I asked.

He replied, “Well, it seems kind of random.”

Gee, thanks. I’m sure I can figure it out with such insightful detail.

It was obvious I was going to have to do some real investigation. My first check was the primary circuit to our provider. Nothing there. So, I’m sorry, Virginia, but “the Internet” is not down, as if I had any doubt.

Next, I looked at the school’s WAN interface, which revealed that yes indeed, the WAN link to the school was becoming saturated at various intervals during the day. Usage would spike for 20 to 30 minutes, then drop down until the next incident. I checked the firewall logs — not my favorite job, which showed a high volume of http connections at the same times.

Now, for many years, checking was the pinnacle of network troubleshooting — check the devices, check the logs, wait for the event to happen again, dig a little further. And in my case, that might have been all I could do. Our contract had us monitoring the entire core data center for the school system, but that only extended to the edge router for the school. We had exactly zero visibility beyond each individual school building’s WAN connection.

But as chance would have it, I had one more trick up my sleeve: NetFlow.

NetFlow has been around a while, but it’s been only in the last few years that it’s entered the common network admin lexicon, largely due to the maturation of tools that can reliably and meaningfully collect and display NetFlow data. NetFlow collects and correlates “conversations” between two devices as the data passes through a router. You don’t have to monitor the specific endpoints in the conversation, you just have to get data from one router that sits between the two points.


Hmm, that sounds a lot like a WAN router connected to the Internet provider, which is exactly what I had. Correlating the spike times from our bandwidth stats, we saw that during the same period, 10 MAC addresses were starting conversations with YouTube. Every time there was a spike, it was the same set of MAC addresses.

Now, if we had been monitoring inside the school, we could have gleaned much more information — IP address, location, maybe even username if we had some of the more sophisticated user device tracking tools in place — but we weren’t. However, a visit to WireShark’s OUI Lookup Tool revealed that all 10 of those MAC addresses were from — and please forgive the irony — Apple Inc.

At that point, I had exhausted all of the tools at my disposal. So, I called the principal back and gave him the start and stop times of the spikes, along with the information about 10 Apple products being to blame.

“Wait, what time was that?” he asked.

I repeated the times.

“Oh, for the love of … I know what the problem is.” Click.

It turns out the art teacher had been awarded a grant for 10 shiny new iPads. He would go from room to room during art period handing them out and teaching kids how to do video mashups.

This was one of those rare times when a bandwidth increase really was warranted, and after the school’s WAN circuit was reprovisioned, the Internet stopped mysteriously “going down.”

The episode would close with the handsome and sophisticated admin — played by yours truly, of course — looking into the camera and while channeling the great Fox Mulder saying, “Remember, my fellow admins, the truth is out there.” (And, I would add, for those of you reading this blog post, don’t forget how valuable NetFlow can be in finding network truth.)

Now, if that’s not compelling TV, I don’t know what is.

(This article originally appeared on SearchNetworking)

%d bloggers like this: