A quick glance at pg_basebackup compression

Enterprise PostgreSQL Solutions

Comments are off

A quick glance at pg_basebackup compression

1. Overview

pg_basebackup is a powerful tool for creating physical backups of PostgreSQL database clusters. Unlike pg_dump, which generates logical backups, pg_basebackup captures the entire cluster state. These backups are crucial for point-in-time recovery or for setting up a standby server.

2. Backup Compression

Efforts to enhance backup performance have led to innovations like parallel processing and the integration of various compression algorithms. Starting from PG version 15, a new compression option enables users to specify where compression should occur. For example,

pg_basebackup -h localhost -D bak1 -Ft --compress=server-gzip:9

Here, the PostgreSQL server handles compression before transferring data to the client. This setup is ideal when network bandwidth is a limiting factor, but the server has ample processing capacity.

Alternatively, you can shift the compression load to the pg_basebackup client using:

pg_basebackup -h localhost -D bak2 -Ft --compress=client-gzip:9

This approach minimizes CPU consumption on the server side but demands more network bandwidth.

To experience, I conducted a speed test by creating a table and inserting 100 million records. The results showed no significant difference in performance as I run server and client on the same machine.

Commands Used:

psql -d postgres

postgres=# CREATE TABLE t(key int, value text);
CREATE TABLE
postgres=# insert into t values(generate_series(1, 100000000), 'hello world');
INSERT 0 100000000


### Compression on PG Server Side:
time pg_basebackup -h localhost -D bak1 -Ft --compress=server-gzip:9
real	8m44.789s
user	0m0.069s
sys	0m0.481s


### Compression on pg_basebackup Side:
time pg_basebackup -h localhost -D bak2 -Ft --compress=client-gzip:9

real   8m40.868s
user   8m40.040s
sys    0m0.672s

3. Summary

The flexibility of pg_basebackup, with its array of parameters and compression options, allows you to fine-tune backups to meet your specific needs. Thorough testing and experimentation will help identify the optimal configuration for your daily backup operations.