Prometheus is a metrics gathering system used for monitoring and alerting. It discovers your systems, pulls in metrics from them and can generate alerts when certain thresholds or patterns are met. For example, it can alert you when a filesystem is getting full, or your web service response times have increased unexpectedly.
It’s been gathering momentum in the Kubernetes-world lately but isn’t exclusive to that ecosystem and can be put to great use everywhere you need to monitor systems.
As opposed to systems like statsd or collectd, Prometheus works primarily on a pull model and connects out to your systems (which it calls targets) to read metrics. So you need to ensure that Prometheus knows about your servers (discovery) and is able to connect to them (access).
Prometheus can be statically configured with each server you want to monitor, but modern cloud-driven systems are often dynamic and auto-discovery of servers is preferable. You can easily get this working on Brightbox, while automatically setting up firewalling too: the trick is to use a server group to setup the firewall access for Prometheus, and then have Prometheus use that same server group’s DNS name to discover the servers.
Then all you have to do to start monitoring a server with Prometheus is to put it in a specific server group, and the rest happens automatically.
Firstly let’s quickly setup a Prometheus server. We’ll use our CLI here, but you can do all this with Brightbox Manager GUI, or Terraform if you prefer.
Create a server group and associated firewall policy for your Prometheus servers:
$ brightbox group create --name="prometheus servers"
Creating a new server group
id server_count name
---------------------------------------------
grp-czxmr 0 prometheus servers
---------------------------------------------
And create some firewall rules to allow you to SSH in and allow the server to connect out:
$ brightbox firewall-policies create --name="prometheus servers" grp-czxmr
id server_group name
---------------------------------------------
fwp-wkigv grp-czxmr prometheus servers
---------------------------------------------
$ brightbox firewall-rules create --source=any --protocol=tcp --dport=22 fwp-wkigv
id protocol source sport destination dport icmp_type description
--------------------------------------------------------------------------------
fwr-rynxl tcp any - - 22 -
--------------------------------------------------------------------------------
$ brightbox firewall-rules create --destination=any fwp-wkigv
id protocol source sport destination dport icmp_type description
--------------------------------------------------------------------------------
fwr-cckb8 - - - any - -
--------------------------------------------------------------------------------
Build an Ubuntu 20.04 (Focal) server in that group:
$ brightbox images | grep focal
img-uitch brightbox official 2020-09-08 public 2252 ubuntu-focal-20.04-amd64-server (x86_64)
$ brightbox server create --server-groups=grp-czxmr --name="prometheus server" img-uitch
Creating a 1gb.ssd (typ-w0hf9) server with image ubuntu-focal-20.04-amd64-server (img-uitch) in groups grp-czxmr
id status type zone created_on image_id cloud_ip_ids name
---------------------------------------------------------------------------------------------
srv-vzcpl creating 1gb.ssd gb1-a 2020-09-14 img-uitch prometheus server
---------------------------------------------------------------------------------------------
SSH in (if you don’t have IPv6 then you’ll need to create and map a Cloud IP to the server first to give it a public IPv4 address) and install the Prometheus server package:
$ ssh ubuntu@ipv6.srv-vzcpl.gb1.brightbox.com
ubuntu@srv-vzcpl:~$ sudo apt-get install -qy prometheus
Now to setup the targets. I’m going to assume you’re using the Prometheus Node
exporter (install the prometheus-node-exporter
package) which listens on TCP
port 9100
but if you’re using other exporters, setup the appropriate firewall
rules.
So, create another server group and an associated firewall policy:
$ brightbox group create --name="prometheus targets"
Creating a new server group
id server_count name
---------------------------------------------
grp-4zxb6 0 prometheus targets
---------------------------------------------
$ brightbox firewall-policies create --name="prometheus targets" grp-4zxb6
id server_group name
---------------------------------------------
fwp-rfzwh grp-4zxb6 prometheus targets
---------------------------------------------
Then create a firewall rule to allow access to your exporter from your Prometheus servers (use the “Prometheus servers” server group created earlier):
$ brightbox firewall-rules create --source=grp-czxmr --protocol=tcp --dport=9100 fwp-rfzwh
id protocol source sport destination dport icmp_type description
-----------------------------------------------------------------------------------
fwr-6chx0 tcp grp-czxmr - - 9100 -
-----------------------------------------------------------------------------------
Now hop back onto your Prometheus server to edit the config
/etc/prometheus/prometheus.yml
and tell it to check the target server group
DNS for new targets:
scrape_configs:
- job_name: grp-4zxb6
dns_sd_configs:
- port: 9100
type: 'A'
names:
- grp-4zxb6.gb1.brightbox.com
and reload prometheus:
$ sudo service prometheus reload
At this point, Prometheus will start periodically resolving the
grp-4zxb6.gb1.brightbox.com
DNS name to check for new members and will start
monitoring them.
So, to start monitoring a server, just add it to the “prometheus targets” group. The appropriate firewall rules will be immediately applied so Prometheus can reach it, and Prometheus will discover the new server within a minute or so and start collecting metrics.
You can run some queries the Prometheus server itself to confirm:
$ promtool query instant http://localhost:9090 'node_load1{group="grp-4zxb6.gb1.brightbox.com"}'
node_load1{instance="10.226.1.1:9100", job="grp-4zxb6"} => 0 @[1600101658.181]
node_load1{instance="10.243.2.2:9100", job="grp-4zxb6"} => 0 @[1600101658.181]
$ promtool query instant http://localhost:9090 'node_filesystem_free{job_name="grp-4zxb6",mountpoint="/"}'
node_filesystem_free{device="/dev/vda1", fstype="ext4", instance="10.226.1.1:9100", job="grp-4zxb6", mountpoint="/"} => 59861299200 @[1600100976.016]
node_filesystem_free{device="/dev/vda1", fstype="ext4", instance="10.243.2.2:9100", job="grp-4zxb6", mountpoint="/"} => 59625115648 @[1600100976.016]
As Prometheus is using our group DNS system and obtaining A records, the instance labels have IP addresses rather than the more convenient server identifiers. Prometheus can instead query SRV records to retrieve server names, so we’re looking at adding support for that in future.
And remember you can customize labels in the job using relabel configs. We use that to label things as staging or production, and sometimes to add explicit project and role labels:
scrape_configs:
- job_name: grp-4zxb6
dns_sd_configs:
- port: 9100
type: 'A'
names:
- grp-4zxb6.gb1.brightbox.com
relabel_configs:
- target_label: environment
replacement: staging
- target_label: project
replacement: myapp
- target_label: role
replacement: webserver
Now, Prometheus can do a lot more; it keeps time-series metric data so you can see the data over time, but you’ll probably want a UI for that - take a look at Grafana.
And Prometheus is usually paired with the Alert Manager, to periodically run definied queries and send alerts when certain conditions are met. There are packages for that in Ubuntu too, so it’s not too hard to get up and running.
So there is lots more for you to explore.
If you want to play with Prometheus, you can sign up for Brightbox in just a couple of minutes and use your ÂŁ50 free credit to give it a go.
If instead you want us to run Prometheus for you, or anything else for that matter, we offer managed services and hands-on support. Drop us a line.