For my current personal project in developing a dashboard, which displays metrics of my servers and devices among other things, I was faced with the challenge to collect metrics from all devices. Usually solutions like Prometheus are quite famous for use cases like this, but I did not want to open any ports on any server. So I was in need for an intermediary which I push data to, to retrieve the data from this third party. Because I already use the Datadog agent on all my devices, using this service appeared to be a solution to my issue. Having Datadog as an intermediary for gathering metrics I created a little tool to help me retrieve my data in the format I need it.
Gathering metrics from the Datadog API
Datadog offers a REST API which can be easily consumed. They even offer project files for Postman, a tool for API testing. However these did not work that well for me. I cannot determine why, but I think the project files are somehow broken or don’t work with my version of Postman. For most of the requests two tokens are needed, the API key and the app key. Most API endpoints are helpful for retrieving metadata, but I needed the metrics themselves. I found the query-timeseries-points endpoint offers everything I needed.
Because I wanted to retrieve data for every host I need to scope the retrieval to the specific hosts. The actual metric is appended to the endpoint as a query:
https://api.datadoghq.eu/api/v1/query?from=1623852965&to=1623853038&query=system.cpu.idle{*}by{tobey-nuc}
The headers must contain DD-API-KEY
and DD-APPLICATION-KEY
with valid values for the call to be successful.
Pointlist data
The response contains some metadata and an array called pointlist. This array contains all the metric data itself and looks something like this:
"pointlist": [ [ 1623852966000.0, 76.36530190991344 ], [ 1623852971000.0, 73.71280104027153 ], ...
In the pointlist there are nested arrays, the first entry in the nested array represents the timestamp this metric has been recorded and the second one the value of the metric. This pointlist is from the response of the endpoint displayed above – so it represents the idle cpu in percent.
My goal was to calculate an average of the given time frame, so I need to sum up all the entries from the current pointlist. For retrieving the uptime of a host, I needed to retrieve just the last entry. Because this is all quite tedious in a script I went for a Java application. Coincidentally I was looking for an excuse to play around with Micronaut, so this project allowed me to do just that. I have open sourced my approach on my Github:
This tool is fully dockerized and runs in a container (due to Micronaut with a small footprint). I designed it to work on a scheduled basis, so it retrieves the data every 10 minutes. To work with my finalized data and display it on the dashboard in Home Assistant I made use of my InfluxDB. The tool persists the calculated data to InfluxDB.
With this approach I was able to get the desired data from my devices without opening any ports just by using the data already made available on Datadog. By the way: Datadog is free for up to 5 hosts, so I can only recommend to have a look at it.