Prometheus metrics within Kubernetes — an aerial view

Published in

DevOps.dev

6 min readApr 26, 2023

An integral element of a system’s architecture will be its monitoring tool. At Mosaic Learning, we use Kubernetes as for our infrastructure management and utilize Prometheus as our metrics tool. In order to help get monitoring set up systematically and quickly, we utilize the prometheus community helm charts.

As this repository contains a LOT of charts, some of which we use and some not, I wanted to take some time to understand what each one does and how they interact with each other. Additionally, I wanted to use this as an opportunity to review the full metrics flow within Kubernetes and beyond into Prometheus itself.

I have created a rough diagram of how metrics flow from Kubernetes into Prometheus (and ultimately Grafana). (As a word of note, the boxes shaded grey represent resources which automatically get deployed together, see section below on the Kube Prometheus Stack.)

Although a picture is worth a thousand words, lets walk through this together.

Kubernetes Core metrics

As stated in the documentation, Kubernetes by default exposes metrics (in Prometheus format) for many of its core services and exposes them at the path /metrics. The kublet, which has cAdvisor built into its binary, exposes its metrics at /metrics/cAdvisor and also defines the endpoints of /metrics/probes and /metrics/resource.

A very good diagram can be found here. In the diagram, we can see that the kublet’s metrics get exported via the Summary API to Metrics Server which exposes it further to HPA and kubelet top. Lets discuss this at a deeper level.

Prometheus Adapter

In actuality, the Summary API is consumed not just by the Metrics Server but anything extending the Kubernetes API Aggregation Layer, hooking into the existing metrics API. (see https://github.com/kubernetes/kube-aggregator). One example of this is the Prometheus Adapter.

If one looks in the prometheus-adapter chart, you’ll see 3 files:

resource-metrics-apiservice.yaml
custom-metrics-apiservice.yaml
external-metrics-apiservice.yaml

Each one of these manifests creates a APIService object, setting its group to metrics.k8s.io , custom.metrics.k8s.io , or external.metrics.k8s.io respectively. We now have a implementation of the the Metrics API for the specific use by the Prometheus server.

Kube State Metrics

Kube State Metrics is another service deployed into the cluster which also queries the Kubernetes API Server. However, the kube-state-metrics does not use the Summary API alone and rather utilizes the client-go client (https://kubernetes.io/docs/tasks/administer-cluster/access-cluster-api/#go-client) to access the cluster (see the API spec). As an example, when detecting the state of a StatefulSet, kube-state-metrics utilizes both the CoreV1().Pods API and AppsV1().StatefulSets API.

The metrics exposed by the Metrics API (such as CPU and Memory usage) are metrics as pertains to the internal functioning of a pod. These metrics are therefore useful for autoscaling purposes (ie: Horizontal Pod Autoscaler). When a pod’s cpu/memory usage hits a certain percentage, HPA now knows to add another replica of the pod to the cluster. On the other hand, the metrics exposed by kube-state-metrics, are, as its name indicates, used to define the state of the cluster. Understanding the number of failed pods, unavailable nodes, or any other resource level event is crucial for the health management of the cluster.

Prometheus Node Exporter

Prometheus Node Exporter is another crucial element in our metrics architecture. This exporter does not access the Kubernetes API Server at all, and sometimes does not exist inside the cluster at all. Even when the kube-prometheus-stack installs it inside the cluster (as one of its dependencies), it does so as a Daemonset and therefore lives on every node. It has access to and exposes node-level metrics external to the cluster to aid in monitoring the health of the node.

Kube Prometheus Stack

We finally arrive at the kube-prometheus-stackchart. This chart provides a full metrics stack — Prometheus, Grafana, Alert Manager etc for actually pulling, processing (and eventually visualizing with Grafana) the cluster data metrics. Lets review the contents of this chart a bit to understand how it works.

CRDs

This chart is the only chart in the prometheus community helm charts which defines Custom Resource Definitions (CRD). These are located in the /crds subfolder and contain the following definitions, all under the monitoring.coreos.com group:

AlertManagerConfig
AlertManager
PodMonitor
Probe
Prometheus
PrometheusRule
ServiceMonitor
ThanosRuler

The main CRD I would like to focus on is the ServiceMonitor. The Service Monitor resource is a custom resource that is used by Prometheus for pulling metrics. In /templates/exporters, you’ll find ServiceMonitors defined for for each core component at the metrics paths discussed above. (When path is not defined, the default is just /metrics.) The Prometheus Operator (see below) watches the ServiceMonitor resources in the cluster in order to scrape metrics from the endpoints defined. With almost all of our applications, we define additional ServiceMonitor resources and those are automatically picked up by the operator and pulled into Prometheus.

(If you are interested about how operators differ from controllers in Kubernetes, I found this article to be excellent.)

Components and Dependencies

When reviewing the /templates folder, you will see that this chart installs Prometheus , Prometheus Operator and Prometheus AlertManager. Additionally, it installs as dependencies Kube State Metrics, Prometheus Node Exporter and Grafana helm charts (with the first two helm charts actually being part of the prometheus community helm charts repository itself.)

The Prometheus Adaptor chart, while not an actual dependency of this stack, is actually also located in the same repository.

Service Monitors

Here is an example of how to use ServiceMonitors to expose custom metrics:

Our application is built on PHP and running php-fpm. In order for us to expose our php-fpm metrics to Prometheus, we would take the following steps:

Create a sidecar metrics exporter container in the deployment’s pod spec:

        - name: fpm-metrics
          image: hipages/php-fpm_exporter
          imagePullPolicy: IfNotPresent
          ports:
            - name: metrics
              containerPort: 6999
              protocol: TCP

2. Create a Service to expose this port for this pod

apiVersion: v1
kind: Service
metadata:
  name: my-service
  labels:
    type: my-metrics
spec:
  ports:
    - port: 9216
      targetPort: metrics
      protocol: TCP
      name: metrics
  selector:
    app.kubernetes.io/name: example-app

3. Create a Service Monitor to select those Service objects

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: my-phpfpm-monitor
  labels:
    app.kubernetes.io/name: my-phpfpm-monitor
    app.kubernetes.io/instance: {{ .Release.Name }}
    app.kubernetes.io/managed-by: {{ .Release.Service }}
    release: prometheus
spec:
  endpoints:
  - port: metrics
    interval: 30s
    scrapeTimeout: 10s
  namespaceSelector:
    matchNames:
    - {{ .Release.Namespace }}
  selector:
    matchLabels:
      type: my-metrics

See here for more details on how to set it up, and here for the api spec.

Once this is done, you can configure Prometheus Adapter to generate further custom metrics with its configuration values.

Metrics API

As a final point of discussion, I wanted to show how one would manually access metrics from the Metrics API (via the Kubernetes API).

Metrics API

kubectl get --raw "/apis/metrics.k8s.io/v1beta1"

kubectl get --raw "/apis/metrics.k8s.io/v1beta1/nodes"

kubectl get --raw "/apis/metrics.k8s.io/v1beta1/pods"

kubectl get --raw "/apis/metrics.k8s.io/v1beta1/namespaces/ubc/pods" | jq '.'

Custom Metrics API

The following example implies that a custom metric named phpfpm_process_utilization was created.

kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/namespaces/<namespace>/pods/*/phpfpm_process_utilization"

If you want to see a full list of all custom metrics, you can use this:

kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1" | jq '. | {metrics: .resources[].name}'

Note: If you run kubectl api-resources | grep metrics, you will expect to see NodeMetrics and PodMetrics for metrics.k8s.io, but nothing for custom.metrics.k8s.io or external.metrics.k8s.io. The reason for this is that these are not Kubernetes resources themselves, but rather they are API groups that expose custom metrics data.

Conclusion

Kubernetes is an amazing system for the management and networking of application infrastructure. This organization allows it the ability to expose crucial metrics to ensure system stability. I would definitely recommend installing the Kube Prometheus Stack chart together with the Prometheus Adapter chart to take advantage of what this package can offer.

Prometheus metrics within Kubernetes — an aerial view

Kubernetes Core metrics

Prometheus Adapter

Kube State Metrics

Prometheus Node Exporter

Kube Prometheus Stack

CRDs

Components and Dependencies

Service Monitors

Metrics API

Conclusion

Written by Joseph Esrig