Absolutely simple infrastructure monitoring

(This article originally appeared on The Observatory)

Depending on your background, the process of instrumenting your applications, systems, and even your coffee pot makes perfect sense. Requests to “pipe this curl command through BASH” are the kind of thing you do every day.

Or, you know, maybe not.

I hail from a sysadmin and network engineer background. And while I’ve spent over two decades focusing on installing, configuring, and maintaining monitoring and observability solutions, code-heavy, developer-centric processes still leave me a little in the dark.

It’s nice when someone can show me exactly how something works and why it’s useful—and then make it simple to set up myself. So that’s exactly what I’m going to do here.

You’ll see how to use New Relic to monitor infrastructure like:

CPU
RAM
Storage
Network traffic

Then I’ll walk through the steps to install New Relic on a system you control, that’s NOT in production (because we’d NEVER test things in prod, right? RIGHT?). It can be a virtual machine (VM) running locally, a system in the private or public cloud, or even your actual machine sitting under (or on top of) your desk. It can be running on Windows, Linux, macOS, Docker, or Kubernetes. It doesn’t matter—because you can monitor any of it with New Relic.

Monitoring your infrastructure with New Relic

First, why would you want to monitor your infrastructure systems with New Relic in the first place? Let’s take a look at some of the key features that are helpful for sysadmins, including dashboards and alerts.

After you’ve installed New Relic and instrumented your system, New Relic’s Hosts dashboard gives you high-level information about your system’s CPU, RAM, storage, network traffic, and so on.

Infrastructure overview in New Relic includes CPU usage, memory usage, network traffic, load average, and other metrics.

You can easily access Metrics and Logs from the left-hand pane. More on those in a moment.

Every data point and statistic you see on screen can potentially be used to generate an alert or notification if something goes wrong. That way, you can use automation rather than waiting for a customer to call and ask “Is the internet down?”

The Metrics dashboard has everything from the Summary dashboard, but more of it, including network inbound and outbound (in bytes and packets), and dropped packets inbound and outbound. That’s alongside more information on CPU, RAM, and disk.

New Relic's Metrics tab in Explorer shows a wide range of infrastructure metrics.

The logs tab, as the name implies, contains log messages.

Logs tab in New Relic Explorer shows log messages

New Relic’s overall logging capability supports a variety of inputs and sources. While it’s good to know you can instrument “anything,” I think it’s always important to know what the default behavior is going to be. So, here are the logs that will start generating messages in the dashboard right away.

For Linux systems (including MacOS), New Relic will forward all messages appearing in:

/var/log/alternatives.log
/var/log/cloud-init.log
/var/log/auth.log
/var/log/dpkg.log
/var/log/syslog
/root/.newrelic/newrelic-cli.log

For Windows systems, the New Relic agent will pass along messages from the following locations:

Security event log entries with the following event IDs:
- 4740
- 4728
- 4732
- 4756
- 4735
- 4624
- 4625
- 4648
All events from the Application event log.
All messages that appear in newrelic-cli.log in the <home directory>\.newrelic of the user that ran the infrastructure agent installation in the first place.

You can also access the Events explorer and Metrics explorer from the left-hand pane. If you’re interested in learning more about them, check out Introduction to the data explorer.

Setting up New Relic

Now that you have an appreciation for both the ease of navigation and the range of data you’re able to collect and display, you might be wondering how to install New Relic yourself. If the level of difficulty ranges from “connecting my wireless mouse” to “setting up an internet-connected coffee pot,” it’s closer to the “mouse” side of the scale. You don’t need to compile code, download multiple libraries, or choose between features or modules before you’ve even had a chance to test the system out.

Here’s what you need to get started:

An active New Relic account (https://newrelic.com/signup)
A system you want to instrument.
A connection to the system where you can cut-and-paste commands.
A connection from the system to the internet.
Optionally, a tool or command to stress test the machine’s CPU, RAM, or disk I/O. Stress-ng and Prime95 are two examples.

Installation Steps

Log into your New Relic account. In the left-hand column, select Add more data.

Add more data link circled in lefthand pane.

Choose Guided install, which will help you install the main infrastructure agent.

Choose your operating system.

Installation plan shows different operating system environments you can select.

Select Begin installation. From the following screen, copy the command and then paste it into the terminal or remote session connected to the system you want to monitor.

Window has a command that should be copied and run for installation.

After the installation is finished, you’ll get a link to the New Relic dashboard in your remote system. Alternatively, you can select See your data in New Relic.

Option to see your data in New Relic One

If any issues come up during the process, the guided install will offer commands, documentation, and suggestions on how to move forward.

Perturbing the system (but only in test)

There are many utilities that will spike the CPU, fill up RAM, or push the disk I/O to the ceiling, so I’m leaving it to your creativity to choose which one to use.

On my test system, after spiking the CPU for 10 minutes and then letting it cool back down, this is what my dashboard looked like.

Terminal shows stress test while New Relic dashboard shows results of stress test.

I used the following stress-ng command for this example:

stress-ng --matrix 0 -t 10m

Of course, there are many other ways a system can break, from single-point-of-failure situations like a disk or RAM to the complex multi-element cascades that you’re more likely to see in real life. There are other stress-ng commands you can use to stress CPU, RAM, disk I/O, and other subsystems.

Honestly, I had so much fun using stress-ng to play “will it crash” with my system that I’m planning another blog post where I beat up a system and show you what it looks like in New Relic.

Monitoring your infrastructure with New Relic

Setting up New Relic

Installation Steps

Perturbing the system (but only in test)

Share this:

Like this: