Backup Label in PostgreSQL

Enterprise PostgreSQL Solutions

Comments are off

Backup Label in PostgreSQL

1. Overview

When I was working on some backup and recovery related features for a project based on Postgres, I noticed that there is file called backup_label. By quickly google search, you can find some very nice blogs or books which discussed this topic, such as, The Internals of PostgreSQL, one of my favourite books. In this blog, I am going to talk it a little more based on my experience.

2. What is backup_label?

The backup_label is a file created in $PGDATA folder when there is an exclusive backup triggered by pg_start_backup() and the backup is in progress. This backup_label file will be removed once the pg_stop_backup() is executed. Here, the exclusive backup is one of the backup methods introduced to Postgres early, and as the name indicated, it does not support multiple backup activities at the same time. Because of this limitation, a frontend backup tool pg_basebackup is added to the Postgres later. This pg_basebackup client does allow multiple backup activities performed at the same time. Therefore, this kind of backup is called as non-exclusive backup. Both backup methods use the backup_label but in a different way.

In exclusive basebackup, the backup_label will be generated automatically on the source server side. To see how this file looks like, you can run a command like, select pg_start_backup('first backup'); from a psql console. Then you should be able to find a backup_label file in $PGDATA folder with the content like below,

START WAL LOCATION: 0/6000028 (file 000000010000000000000006)
BACKUP METHOD: pg_start_backup
START TIME: 2021-10-15 13:30:03 PDT
LABEL: first backup

3. How does it work?

In exclusive backup mode, the Postgres source server will generate this file when pg_sart_backup() is executed, and removed after pg_stop_backup(), however, in non-executive backup mode, such as using pg_basebackup client to perform a base backup, the backup_label is only streamed to the client side but not physical saved to the source Postgres server.

As you can see in above baseup_label file, it contains a similar checkpoint information compared to pg_controldata file. If a backup is used in recovery with this backup_label file present, then Postgres will use the checkpoint in backup_label to start the REDO process. The reason is that there could be multiple checkpoints happening during the backup process. After the recovery process is done, this backup_label file will be renamed as backup_label.old to indelicate the recovery finished properly. In simple words, with the backup_label file, the database has a consistent checkpoint to recover from a proper archive.

4. Does it impact any frontend tool?

The answer is yes. Some frontend tools will perform differently if a backup_label file is present. For example, if pg_ctl sees a backup_label file during smart shutdown process, it will wait for it to be removed by providing a waring message to the end user with something like,

WARNING: online backup mode is active
Shutdown will not complete until pg_stop_backup() is called

Another example is the frontend tool pg_rewind which creates a backup_label to force a recovery to start from the last common checkpoint.

5. Summary

In this blog, I explained how the backup_label file works in Postgres. I believe the end users won’t pay attention to it most of the time, but if you do encounter some issues related with backup_label then I hope this blog can give you some clues.