Monitoring with Prometheus on Docker

Shubham Thakur
DevOps.dev
Published in
7 min readSep 17, 2022

--

After deploying an application it is very important to monitor the health and various statistics of the container. Prometheus is a tool which can be used to monitor various metrics and show various graphs for the services.

In this tutorial, we will see how to configure the Prometheus using Prometheus config file and deploy it on docker. we would perform the following

  1. Deploy Prometheus container on docker.
  2. Configure it using Prometheus config file.
  3. Explore the Prometheus web UI.
  4. Create custom rules for Prometheus.
  5. Use docker compose to deploy
  6. Create alerts on conditions.
  7. Explore exporters and use them to collect metrics.
  8. Explore the alert manager and configure it using config file of alert-manager
  9. Explore the relabeling of labels.
  10. Make it beautiful on grafana

We would cover first 7 points in this part of tutorial and next 3 in later tutorials. So lets start with setting up Prometheus docker container.

Installing Docker

If you haven’t set up docker yet you can use the following command,

sudo mkdir -p /etc/apt/keyringscurl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpgecho \
"deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu \
$(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt-get updatesudo apt-get install docker-ce docker-ce-cli containerd.io docker-compose-plugin

Well that will take care of installation of docker.

If you are using some other OS (like cent-os , RHEL etc you can find its guide on the link)

Check if docker is running.

sudo docker ps

So now we will install docker. You can read about docker here.

Installing Prometheus container

Lets first create a prometheus.yaml inside a config folder

You can use the following commands

mkdir config
touch prometheus.yaml

To Install Prometheus container on your system use the following command:

Don’t worry we will configure promethues.yaml and add rules later

docker run \
-p 9090:9090 \
-v "$(pwd)"/config:/etc/prometheus/prometheus.yml \
prom/prometheus

Each argument is explainer below

  1. - p : we are mapping our port 9090 to 9090 port of container. Prometheus listens at 9090 port inside container. format for -p arguement is <your-port>:<target-container-port>
  2. - v : this argument is used to map your local host directory or file to a machine directory or file of container. Remember to map a file to a file and directory to a directory. if directory did not exist on machine it will be created. Using this we can map our Prometheus.yml to containers Prometheus.yml

Now lets edit the prometheus.yaml.

Editing Prometheus.yaml

Open prometheus.yaml we created in code editor of your choice

Now lets add the config, copy and paste the following snippet

global:
scrape_interval: 15s
evaluation_interval: 30s
# scrape_timeout is set to the global default (10s).rule_files:
- "prom-rules/*.rules"

Here we are scrapping the metrics from the endpoints after every 15 seconds. We will evaluate a job for 30s before firing the alerts

rule_files is the path where prometheus can find rules for evaluation and creation of alerts.
We will add rules later in article.

Now we will add targets from where we can scrape the metrics. Lets just add a job named Prometheus which keeps an eye on prometheus.

scrape_configs:
- job_name: "Prometheus"
static_configs:
- targets:
- "localhost:9090"

Static config is used to collect data over some static address. localhost:9090 is where our prometheus is running so it will collect data from localhost:9090 for prometheus

Prometheus by default collects data from /metrics endpoint. So if we are collecting data from localhost:9090 Prometheus will look for data at localhost:9090/metrics

Now lets add some alerts to the prometheus.

Create a folder prom-rules inside config folder. Now create a file called prometheus-rules.yaml . Copy the following in file

groups:
- name: prometheus
rules:
- alert: PrometheusTargetMissing
expr: up == 0
for: 0m
labels:
severity: critical
annotations:
summary: Prometheus target missing (instance {{ $labels.instance }})
description: “A Prometheus target has disappeared. An exporter might be crashed.\n VALUE = {{ $value }}\n LABELS = {{ $labels }}”

Lets go through what this is saying:

we create a group named prometheus which has 1 rule which is evaluated as following

It has alert named PrometheusTargetMissing and which is triggered when any target ( which we define in jobs) is missing for 0 minutes. And if a alert is fired it is labelled which label severity = critical.

For some common alerts you can visit https://awesome-prometheus-alerts.grep.to/rules

So now are prometheus docker is done with some rule.

Next now lets visit the web interface of prometheus

Prometheus Web Interface

First launch the docker container for prometheus

docker run \
-p 9090:9090 \
-v "$(pwd)"/config:/etc/prometheus \
prom/prometheus

Now go to localhost:9090 on your browser.

First page will be the graph page

Prometheus graph page

This is page where you can write down your queries and interact with metrics live.

Now click on alerts section at top left. This will take you to alert page

Here will be the list of all alerts with different color codes

  1. Green implies everything is good and no alert is firing in the group
  2. Yellow implies condition for alert is met and is being monitored for evaluation period before firing
  3. Red implies alert has been fired

For me it is red as it is unable to find 2 targets which we will add later

Next click on status. Here you can see various tabs. Lets enlist them

  1. Runtime & Build Information: It contains the current stats all prometheus and the build information for prometheus
  2. Next is TSDB (Time Series Database). It is the DB where prometheus saves the data collected through various targets. TSDB status analyses the data and shows some stats over the data
  3. Command-Line flags are the arguements given to prometheus during runtime. It is useful if you want to see if arguments are being passed properly to prometheus container.
  4. Configuration is the current config of the prometheus.
  5. Rules are the prom-rules we defined earlier. It also provide extra information about the state of rule and when it was scrapped.
  6. Targets contains information about all the enlisted targets. It contains information like is target up and running when was it scrapped and Error during scrape if any.
  7. Final service discovery is very useful and contains information about the discovered labels for various jobs

Now lets add node-exporter which can monitor the status of nodes and will expose various metrics

First lets address a problem. writing docker run command can become tedious as we increase out containers. So we will write a docker-compose file which can maintain the containers for us.

Docker compose file

version: '3.4'services:  prometheus:   image: 'prom/prometheus:latest'   container_name: prometheus
volumes:
- ./config:/etc/prometheus
command:
- '--config.file=/etc/prometheus/prometheus.yml'
ports:
- '9090:9090'
cadvisor:
image: 'google/cadvisor:latest'
container_name: cadvisor
volumes:
- /:/rootfs:ro - /var/run:/var/run:ro - /sys:/sys:ro - /var/lib/docker/:/var/lib/docker:ro - /dev/disk:/dev/disk/:ro ports: - '8080:8080' node-exporter:
image: prom/node-exporter
container_name: node-exporter
environment:
- NODE_ID={{.Node.ID}}
volumes:
- /etc/hostname:/etc/nodename:ro
- /proc:/host/proc:ro
- /sys:/host/sys:ro
- /mnt/docker-cluster:/mnt/docker-cluster:ro
- /etc/localtime:/etc/localtime:ro
- /etc/timezone:/etc/TZ:ro
command:
- '--path.procfs=/host/proc'
- '--path.sysfs=/host/sys'
- '--collector.textfile.directory=/etc/node-exporter/'
- '--collector.filesystem.ignored-mount-points=^/(sys|proc|dev|host|etc)($$|/)'
# no collectors are explicitely enabled here, because the defaults are just fine, # see https://github.com/prometheus/node_exporter # disable ipvs collector because it barfs the node-exporter logs full with errors on my centos 7 vm's - '--no-collector.ipvs' ports:
- 9100:9100

In nutshell this compose file will

  1. create a prometheus container and expose it to 9090 port
  2. create a node-exporter container and expose it to 9100 port
  3. create a cadvisor container and expose it to 8080 port

We will discuss about node-exporter next

Node Exporter

Node exporter is a exporter maintained by prometheus and can be used to export ( or collect) details about the node. It is

Prometheus exporter for hardware and OS metrics exposed by *NIX kernels, written in Go with pluggable metric collectors

you can read more here node-exporter github link.

Lets review the node-exporter we previously entered in docker-compose.

node-exporter:
image: quay.io/prometheus/node-exporter:latest
container_name: node-exporter
environment:
- NODE_ID={{.Node.ID}}
volumes:
- /etc/hostname:/etc/nodename:ro
- /proc:/host/proc:ro
- /sys:/host/sys:ro
- /mnt/docker-cluster:/mnt/docker-cluster:ro
command:
- '--path.procfs=/host/proc'
- '--path.sysfs=/host/sys'
- '--collector.textfile.directory=/etc/node-exporter/'
- '--collector.filesystem.ignored-mount-points=^/(sys|proc|dev|host|etc)($$|/)'
# see https://github.com/prometheus/node_exporter- '--no-collector.ipvs'ports:
- 9100:9100

We map various volumes which enables the node details to be extracted by node-exporer container.

default port 9100 for container. we map 9100 on our local machine to 9100 on container.

Node-Exporter exposes the matrix at /metrics endpoint.

You can access the endpoint at localhost:9100/metrics. Prometheus scrapes this metrics.

Now lets update our rules file with rules in node-exporter.

Node-Exporter Rules

edit the file prometheus-rules.yaml

To add rules go to https://awesome-prometheus-alerts.grep.to/rules Copy whole section paste it in file.

This is a fast way to add common alert to many exporters.

Now you can add rules for various exporters.

Now lets discuss how alerts will be generated

Alerts

One of the rule defined above is as follows:

    alert: HostOutOfMemory
expr: node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes * 100 < 10
for: 2m
labels:
severity: warning
annotations:
summary: Host out of memory (instance {{ $labels.instance }})
description: "Node memory is filling up (< 10% left)\n VALUE = {{ $value }}\n LABELS = {{ $labels }}"

So when node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes * 100 < 10 this expression is satisfied we will launch an alert named as HostOutOfMemory

The alert can have 3 stages:

  1. Green means it is not active.
  2. Yellow means alert is being fired i.e. condition has been statisfied and is now being evaluated
  3. Red means alert is fired.

When alert is fired it is sent to alertmanager. But we will discuss alertmanager later.

Hope you liked the article, If there are any suggestions or mistakes i have done please let me know.

--

--