Simple Script Metrics with Flex

(this post originally appeared on the New Relic Blog)

After completing the 3-part series on collecting pi-hole metrics with New Relic flex last month, I realized I hadn’t ever posted the original introductory series that introduces Flex from the ground-up. So that’s what I’ll be doing over the next 3 weeks. Enjoy!

In my last post on the New Relic Flex integration, I talked about times when pre-made New Relic agents and quickstarts are missing a critical metric or value. In that post, I used a simple example: taking the output of a standard command (I used df for Linux systems and netstat for Windows), pushed it to New Relic, and displayed it as a chart on a dashboard. In cases like this, Flex is a handy option to close the gap and get you the data you need.

In this post, I want to take that concept a step further. There are situations where those simple scenarios aren’t enough. This can be because:

The output isn’t in the proper format.
The data needed is the result of multiple commands.
There isn’t an existing command.

In those cases, IT practitioners often turn to scripting to reformat, combine, or create the data needed. And so the question is how to get the result of those scripts into New Relic.

The good news is this isn’t significantly different from running built-in commands. This post covers a few of the nuances to help you avoid common pitfalls as you get started.

Testing your internet speed

Let’s continue working on the example from the previous post in this series and expand on it.

A common question we all have, whether in a corporate setting or at home, is “Am I getting the internet speed I’m paying for?” To answer this, install Ookla’s Speedtest.net command line utility using these instructions.

After it’s installed, test it out to ensure it works the way we’ll need it for this tutorial. Use this code in the command line:

speedtest --accept-license -f csv

You should see output like this:

There are some problems with that output.

First, we don’t need all of that information
Second, the main data we want (upload/download speed) is in a raw format, which is bytes per second. To get the megabits per second (Mbps) value we expect, we need to divide it by 125,000.

I’ll walk through the magic of a script I’ve written called NR_checkspeed.py later. For now, let’s use it to reduce the output to three values:

Latency
Download speed (in Mbps)
Upload speed (in Mbps)

What’s the YAML file look like?

If you’ve been following along from the first post, this file doesn’t look terribly different. I want to draw your eye to a couple of specifics:

integrations:
  - name: nri-flex
  timeout: 5m
  interval: 10m
  config:
    name: linuxspeedtest
    apis:
      - name: pyspeedtest
      commands:
        - run: sudo python3 /etc/newrelic-infra/integrations.d/NR_checkspeed.py
        split: horizontal
        split_by: \s+
        set_header: [latency,download,upload]
        timeout: 300000

The first point is purely for the production implementation. You don’t want to be checking your internet speed every 10 minutes. So ratchet up the interval line to: interval: 1h.

Meanwhile, if you’ve run speedtest from the command line, you know it’s not exactly snappy. Flex expects a default execution under 30 seconds, so that’s not going to work. Setting the timeout inside the commands block will fix it: timeout: 300000.

That’s five minutes converted to milliseconds.

- run: sudo python3 /etc/newrelic-infra/integrations.d/NR_checkspeed.py

A quick reminder that if you’re running this on a Windows system, your directory would be C:\Program Files\New Relic\newrelic-infra\integrations.d.

Now we’ll move on to the way the output is handled:

split: horizontal

This tells Flex to take multiple values that appear in a line and break them up into separate data points.

split_by: \s+

In addition, the symbol Flex uses to determine where one value ends and another begins is multiple spaces. This uses standard regular expression syntax.

set_header: [latency,download,upload]

This line sets the headers, which we’ll use to set up our NRQL query in one.newrelic.com.

For more about these settings, along with all of the Flex configuration options, check out our on-host integrations documentation.

“Troubleshooting” and other swear words

In my previous post, I didn’t dig into the process of checking if things are working, or if they aren’t, why not.

Without the following techniques, the only way you’ll know if the Flex integration works is to set up a simple NRQL query and keep checking the output for error messages. But there are better options.

Run Flex manually

You can run the New Relic Flex utility in the command line and see the output immediately. Find the utility at: /var/db/newrelic-infra/newrelic-integrations/bin/nri-flex. While there are a bunch of command line options, the ones you want for troubleshooting are --verbose and pretty.

So…presuming our sample YAML file from the earlier section is named py_ookla-speedtest.yml, our command would be:

sudo /var/db/newrelic-infra/newrelic-integrations/bin/nri-flex --verbose --pretty --config_file ./py_ookla-speedtest.yml

The output of that command would look something like this:

Logging

You can also set up the logging level of the nri-flex utility itself. Edit the file /etc/newrelic-infra.yml and include the these items:

log:
  file: '/var/log/newrelic-infra/newrelic-infra.log'
  level: debug
  forward: true
  stdout: false

For more information on logging options, check out the documentation on infrastructure agent options.

The necessary NRQL

If everything works, you’re ready to start showing the data in New Relic. Head over to your New Relic portal, open up a NRQL window, and type the following query:

FROM pyspeedtestSample SELECT *

This will show you if you’re collecting data at all, and if so, whether you’re getting an error or data. Once again, I’m going to presume you’ve got data coming in and you see output that looks something like this:

As you can see from the screenshot, we’ve got download metrics coming in. Scroll to the right and you’ll also see columns for upload and latency. In NRQL you need to show the latency, and on a separate chart the upload/download speeds. This would be:

FROM pyspeedtestSample SELECT average(latency) TIMESERIES

and

FROM pyspeedtestSample SELECT average(download), average(upload) TIMESERIES

After your query is working, give the query a name, select the graph type, and assign it to one of your dashboards:

The script behind the scenes

If your goal is to learn how to use the New Relic Flex integration, the script you use is important. At the same time, it’s not essential for you to use my exact version. If you want to create your own in your preferred language, that’s absolutely fine. I’m including my script here, along with comments about which parts are noteworthy, simply for reference and convenience.

import os
import subprocess
#========================================
#Define variables
speedlist = speedrun = ""
latency = download = upload = ""
batch = response = ""
#========================================
#Function Junction
def fixnum(x):
x = x.replace('"', '')
x = float(x)
return x 
#========================================
speedrun = os.popen("speedtest --accept-license -f csv").read()
speedlist = speedrun.split(",")
latency = fixnum(speedlist[3])
download = fixnum(speedlist[6]) / 125000
upload = fixnum(speedlist[7]) / 125000
print(latency, " ", download, " ", upload"

To break this down a bit:

import os
import subprocess

This includes two modules we’ll need later to run external commands and get the results back into the program.

def fixnum(x):
  x = x.replace('"', '')
  x = float(x)
  return x

This subroutine removes the quotes around the CVS output.

speedrun = os.popen("speedtest --accept-license -f csv").read()

This runs the speedtest command and grab the results.

speedlist = speedrun.split(",")

This splits the results into multiple values.

Then, you’ll need to take each of the component values and assign them names. Also, divide the download and upload numbers by 125,000 to get an accurate “bits per second” value, as shown here:

latency = fixnum(speedlist[3])
download = fixnum(speedlist[6]) / 125000
upload = fixnum(speedlist[7]) / 125000

Finally, you’ll need to echo the results so they can be picked up by Flex, as shown here:

print(latency, " ", download, " ", upload)