When I was working on some backup and recovery related features for a project based on Postgres, I noticed that there is file called
backup_label. By quickly google search, you can find some very nice blogs or books which discussed this topic, such as, The Internals of PostgreSQL, one of my favourite books. In this blog, I am going to talk it a little more based on my experience.
2. What is backup_label?
backup_label is a file created in $PGDATA folder when there is an
exclusive backup triggered by
pg_start_backup() and the backup is in progress. This
backup_label file will be removed once the
pg_stop_backup() is executed. Here, the
exclusive backup is one of the backup methods introduced to Postgres early, and as the name indicated, it does not support multiple backup activities at the same time. Because of this limitation, a frontend backup tool
pg_basebackup is added to the Postgres later. This
pg_basebackup client does allow multiple backup activities performed at the same time. Therefore, this kind of backup is called as
non-exclusive backup. Both backup methods use the
backup_label but in a different way.
In exclusive basebackup, the
backup_label will be generated automatically on the source server side. To see how this file looks like, you can run a command like,
select pg_start_backup('first backup'); from a psql console. Then you should be able to find a
backup_label file in $PGDATA folder with the content like below,
START WAL LOCATION: 0/6000028 (file 000000010000000000000006) CHECKPOINT LOCATION: 0/6000060 BACKUP METHOD: pg_start_backup BACKUP FROM: master START TIME: 2021-10-15 13:30:03 PDT LABEL: first backup START TIMELINE: 1
3. How does it work?
In exclusive backup mode, the Postgres source server will generate this file when
pg_sart_backup() is executed, and removed after
pg_stop_backup(), however, in non-executive backup mode, such as using
pg_basebackup client to perform a base backup, the
backup_label is only streamed to the client side but not physical saved to the source Postgres server.
As you can see in above
baseup_label file, it contains a similar checkpoint information compared to pg_controldata file. If a backup is used in recovery with this backup_label file present, then Postgres will use the checkpoint in backup_label to start the REDO process. The reason is that there could be multiple checkpoints happening during the backup process. After the recovery process is done, this
backup_label file will be renamed as
backup_label.old to indelicate the recovery finished properly. In simple words, with the
backup_label file, the database has a consistent checkpoint to recover from a proper archive.
4. Does it impact any frontend tool?
The answer is
yes. Some frontend tools will perform differently if a
backup_label file is present. For example, if
pg_ctl sees a
backup_label file during smart shutdown process, it will wait for it to be removed by providing a waring message to the end user with something like,
WARNING: online backup mode is active Shutdown will not complete until pg_stop_backup() is called
Another example is the frontend tool
pg_rewind which creates a
backup_label to force a recovery to start from the last common checkpoint.
In this blog, I explained how the
backup_label file works in Postgres. I believe the end users won’t pay attention to it most of the time, but if you do encounter some issues related with
backup_label then I hope this blog can give you some clues.