DevOps.dev

Devops.dev is a community of DevOps enthusiasts sharing insight, stories, and the latest…

Follow publication

Persistent Data Structures in VictoriaMetrics (Part 2): vmselect

--

Series Introduction

VictoriaMetrics is an open-source, high-performance time series database. It serves as an alternative to Prometheus and also the long-term storage solution of Prometheus and has been adopted by many companies in production. While VictoriaMetrics provides detailed documentation and examples, there is a lack of discussion regarding how it persists data on disk.

Given the increasing number of active members in the community, it would be beneficial to have a blog post that serves as a bridge between the user guide and the source code, helping individuals make more meaningful contributions.

This series aims to provide insights into how VictoriaMetrics organizes and operates on-disk data. It does not require any prior knowledge of the Go programming language. However, it is good to have a basic understanding of VictoriaMetrics’ components.

1. Persistent Data Structures in VictoriaMetrics (Part 1): vmagent
2. Persistent Data Structures in VictoriaMetrics (Part 2): vmselect

vmselect Intro

vmselect is the query component of VictoriaMetrics. Typically, it sits behind the load balancer. It’s responsible for querying multiple vmstorage nodes, merging the response data, caching them locally, and returning them to the user.

Rollup Result Cache

A time-series query may retrieve tons of data points. Those data points need to be aggregated before being visible to the user. A “rollup” refers to a time series that is aggregated over time. It is generated based on data points, interval, and aggregation function, such as sum and max.

In vmselect, the rollup result will be cached in the RollupResultCache. Let’s take a look at this example: Assuming that we have executed a PromQL query on the time range [a, b], if we execute it again on the time range [a+5, b+5]:

  1. The RollupResultCache could be used as the result for the time range [a+5, b].
  2. vmselect has to query for the data of [b+1, b+5] from vmstorage.

vmselect will collect and aggregate result data from multiple vmstorage nodes, and merge them with the RollupResultCache, generating the new rollup result as a response. The new result will also be stored in the RollupResultCache.

Although the blog post’s title is “Persistent Data Structures,” we will still briefly introduce in-memory data structures. Most of the time, the RollupCacheResult resides in memory with the following features:

  1. It consists of two key-value data structures, representing hot data and cold data.
  2. Cold data will be freed every 60 seconds if it is not accessed by more than 10% of queries. Additionally, hot data will be periodically downgraded to cold data.
  3. If an item in the cold data is accessed, it can be moved back to the hot data.

FastCache

When vmselect is about to exit, the hot data in RollupCacheResult will be persisted to disk. The data structure that stores the hot data (in fact, both hot data and cold data use the same data structure) is called FastCache.

FastCache is composed of multiple buckets. Each bucket contains a ring buffer and a hash index. The ring buffer is responsible for storing encoded key-value pairs, while the hash index records the position of a specific key in the ring buffer as an index.

During the persistent procedure:

  1. All ring buffers will be compressed with the Snappy algorithm and written into a file called data.n.bin.
  2. The capacity of the ring buffer will be written into a metadata.bin file. This can help verify if the persistent data is complete during the next restart.

The folder structure of presistent data looks like:

./rollupResult
├── data.0.bin
├── data.1.bin
├── data.2.bin
├── data.3.bin
└── metadata.bin

Extra

During the querying of uncached data, vmselect will retrieve a large amount of data from vmstorage. This data will be stored on disk as a tmpBlocksFile. A tmpBlocksFile contains a []byte buffer, whose size is automatically configured based on memory resources, and a *os.File that points to the temporary file. The retrieved data will be appended to the buffer and flushed to the temporary file when the buffer is full.

./tmp
└──searchResults
└──2400906475

Those tmpBlocksFiles will be read and aggregated into the rollup result in parallel, and then merged with the RollupResultCache.

Further Reading

You can find the corresponding codes for:

--

--

Published in DevOps.dev

Devops.dev is a community of DevOps enthusiasts sharing insight, stories, and the latest development in the field.

Written by Zhu Jiekun

Observability: Achieving production excellence.

No responses yet