The Eleph…err…Autovacuum in the room.

Published in

DevOps.dev

4 min readJan 27, 2023

A few blogs ago, I argued that the autovacuum is the most hardworking employee of the company. At the same time, I complained about Postgres being difficult to herd like cattle. Let’s dig a bit deeper into the subject and try to address the elephant in the room. It is known that autovacuum is IO intensive, but how much and why does that matter when we move to a popular cloud platform?

Firstly, let’s see how much IO to expect.
At most, an autovacuum can do IO as shown below.
Max Autovacuum io = (1000/cost_delay) *(cost_limit/vacuum_cost_page_hit) * 8 KB

For PG12 default settings:
Max Autovacuum io = (1000/2) * (200/1) * 8 KB = 781.3 MB per second 
(Note : cost delay was reduced drastically from 20ms to 2ms in PG12 so our PG10 to PG12 migration immediately saw Autovacuum IO increase.)

On RDS, autovacuum_cost_limit is a function of the instance class.

autovacuum_vacuum_cost_limit = GREATEST({log(DBInstanceClassMemory/21474836480)*600},200)

So for a r5b.16xlarge instance with 512GB memory you are reaching 812 cost limit which corresponds to a whopping 3.2GB/s.

Now let’s see how much throughput AWS storage offers. First we examine GP2 storage. GP2 storage IOPS and subsequently throughput are a function of storage size as shown in the following graph.

On io1 (IO provisioned) you can get higher throughput (at higher cost of course).

So we have a missmatch in terms of throughput that increases with size and autovacuum IO that is a function of instance memory. We can overcome this by defining our own cost limit and cost delay formula for all databases based on storage size. However, autovacuum throughput requirements put a big dent in our scaling strategy discussed in the previous blog. Thousands of small databases for thousands of micro-services can’t be easily herded like cattle. We need to treat them as a pet and carefully decide autovacuum parameters according to their size and usage.

GAME CHANGER :
AWS announced support for GP3 on RDS. GP3 on RDS offers 12000 iops,500MB/s baseline (at no extra charge) for all volumes beyond 400GB. It offers a decent 3000 iops,125MB/s for volumes lower than 400GB. If your application is not latency-sensitive (It would take another blog to explain this further but at present we see higher read/write latency for GP3 volumes than GP2) , you could consider GP3 for providing more room for autovacuum.

GP3 offers incredible cost saving opportunity for vanilla RDS Postgres.

For example: Following are the read iops and throughput graphs of one of the very busy 9TB database.

Read throughput hits the roof when large tables are auto-vacuumed.

Corresponding read iops is high but not as high as we would be forced to pay for on io1 storage.

Below is cost calculation of iops/throughput needed on keeping auto-vacuum optimal on this behemoth of a database.

io1 monthly cost of 32K iops and 500MB/s throughput
0.149*9020(storage)+0(throughput)+32000*0.078(iops) =$3839.98
io1 monthly cost of 64K iops and 1000MB/s throughput
0.149*9020(storage)+0(throughput)+64000*0.078(iops) =$6335.98
gp3 monthly cost of 12K iops and 1000MB/s throughput
0.0952*9020(storage)+0.048*500(throughput)+12000*0(iops)=$882.704

Almost $5K per month per RDS saved (If your application latency allows it).

CONCLUSION:
The most optimal option for us is to start those tiny databases with conservative autovacuum settings for cost_limit and cost_delay, then incrementally improve them. Also, this opens Pandora’s box of monitoring dead tuples piling up and table/index bloat and automating the fix after posting below meme in MYSQL/MARIADB dba team slack channel.

The Eleph…err…Autovacuum in the room.

Written by Mayur (Do not drink & database)