WAL is one of the most important parts of PostgreSQL., WAL records all the database activity. Hense we can regard wal as a change roadmap of the history of PostgreSQL database, and the crash recovery, logical replication etc aren’t possible without WAL. The following picture describes the various wal related GUC (based on PG12) involved in the production and use of wal logs. It is very important for us to know the meaning of each parameter to optimize database performance and configure high availability cluster.. We can find the definition of each GUC in the PostgreSQL document, but we can hardly understand the intrinsic meaning of the parameter from a few simple lines, or we don’t know why it exists. What’s more, when you configure the database according to someone else’s blog, you find that your database version doesn’t know the configuration parameters in the blog. This blog will start from the PostgreSQL version 7.1 to review the development of wal log.
However, the history is so long, I want to divide it into two parts, first is for (V7.1-V8.3) which is a relatively old version, and another is for (V9.0-V12) which being developed and used.
1. WAL Genesis(V7.1)
I will not describe the POSTGRES project of Berkeley again. The topic today is a prehistoric civilization that is hard to search. Let us look at wal on PostgreSQL7.1 below, so clearly and we can hold everything.
The above diagram describes the WAL’s normal job only. The main concept of WAL is that any changes to data files must be written only after those changes have been logged – that is, when log records have been flushed to permanent storage. In this way, we don’t need to flush the data files in the shared cache from time to time, because if the database crashes, we can load the data from the wal. The purpose of this method is to replace random data writing with sequential writing wal to obtain higher execution efficiency.
“log records have been flushed to permanent storage”, there are similar words in the general introduction of the wal log, it seems easy but PostgreSQL does a dozen things to implement the several words in the purpose of performance, and look at [guc1] in the figure above some GUC related.
Every PostgreSQL backend may produce wal record, the records are written into PostgreSQL share buffer first and ‘wal_buffers’ GUC defines the size of share buffer for wal.
‘fsync’ provides a way to write data into persistent storage finally before completing a writing process. However, this will lead to a decrease in the efficiency of database execution. This parameter is valid not only for the swipe of wal, but also for other data files.
This parameter is a complement to the ‘fsync’ parameter, which specifies the method to ensure the data is written to the persistent storage.
COMMIT_DELAY && COMMIT_SIBLINGS
The brush wal cache is triggered when the transaction is committed. At this time, PostgreSQL does a little more to improve performance: ‘commit_delay’ refers to the time allowed for wal cache to delay writing after the transaction is committed. The purpose of this delay is to wait for parallel brother transactions to be executed. After the brother transactions are committed, the wal logs will be written to disk together. If the brother transactions are not committed after the ‘commit_delay’ time, then current transaction completes the wall brush. The meaning of ‘commit siblings’ is the condition for the current transaction to wait. If the sibling transaction executed in parallel is little than ‘commit_siblings’, the current transaction will not wait.
It is a develop argument, and we can ignore it.
PostgreSQL usually uses the checkpoint process to clean up some obsolete files. For the wal file, some future wal files will be generated at a checkpoint or the old wal files will be deleted. See the related configuration [guc2]. Note that the 7.1 version described here is no longer applicable to the latest PostgreSQL version.
In PostgreSQL version 7.1, if ‘wal_files’ parameter is greater than 0, some wal files will be created in advance during the checkpoint. When the ‘wal_files’ parameter is equal to 0, the wal segments will be created one by one driven by PostgreSQL backends. This parameter was removed at version 7.3, because of a new pre-generation way.
PostgreSQL executes the wal interval of checkpoint once, since the last checkpoint, PostgreSQL will trigger checkpoint again after writing a certain number of wal segments.
2. Change the way of wal pregeneration(7.3)
This version has made minor changes based on version 7.1, mainly changing the method of pre-generate wal segment. During a checkpoint, we do not pre-generate wal segments, but maintain wal segments in the wal generation directory with new policies: ① the number of wal segments is not more than 2 * checkpoint_segments + 1 in principle; ② a traffic peak may cause the backend to generate wal logs quickly, and the number of wal segments may be increased to more than 2 * checkpoint ﹣ segments + 1. ③ When making checkpoints, if the number of wal segments exceeds the limit, the over-limit wal segments will be deleted. If the number of wal segments does not exceed the limit, the old wal segment will be renamed as the future wal segment. Therefore, the ‘wal_file’ parameter is no longer used and has been removed in this version.
3. Convenience warning(V7.4)
There is no functional change in this version other than the warning parameter ‘checkpoint_warnings’ is added.
If checkpoint caused by ‘checkpoint_segments’ is more often than ‘checkpoint_warning’, then a warning will be generated in database log.
4. PITR comming(V8.0)
Finally, the layout of wal has been expanded to realize point in time recovery (PITR). PITR is the physical backup mechanism of PostgreSQL. The main processes are: turn on archive; make a base backup; create recovery.conf file in the base backup database and writing recovery parameters; starting the base backup database. This blog is about history, not about the use of functions.
This parameter provides a wal log archiving method for PostgreSQL.
These are parameters related to the recovery target.
5. Full page write and warm standby(V8.2)
When the operating system crashes, it may lead to partial write of PostgreSQL data pages, so data inconsistency may occur. To deal with this situation, PostgreSQL will back up a data page (full page write) in the wal log after a checkpoint, each time a data page is modified.
Also, this version has the concept of PostgreSQL warm standby, which is based on the wal transfer at the wal segment level. The hot standby that appears later is based on the wal transfer at the wal record level. The warm standby machine periodically obtains the wal segment through the restore command and performs the wal replay, so that the standby machine keeps catching up with the host data.
We can understand the warm standby and PITR here. If you want PITR to stop after wal replays all wal logs and wait for the following wal to appear, this is warm standby. This version of “stop and wait for the next wal” is not implemented in the kernel, but requires you to integrate this step into the ‘restore_command’.
Here comes the [GUC5], this group is the arguments related to configure the contents of wal.
The meaning of this parameter is to enable PostgreSQL to resist some write problems caused by an operating system crash. On the other hand, because a lot of data page information is recorded in the wal log, the wal log will expand. In addition, the checkpoint executed during basic backup does not care about ‘full_page_write’ value and forces full page write.
Because wal archiving only takes effect on the completed wal segment, if a wal segment is not full for a long time, the wal segment will not be archived, and therefore cannot be used by the standby mechanism in the warm standby function.
‘arcihve_timeout’ defines a time period, if a wal period exceeds this time period and is not full, a wal switch archive is forced.
6. Auto flush wal buffer(V8.3)
There are some small improvements and enhancements in version 8.3
This parameter defines the conditions for the final completion of a transaction. If the parameter value is on, the transaction will not complete the commit until the wal log of transaction commit is synchronized to the wal segment. If the parameter value is off, the wal log of transaction commit can be written to the wal cache. In later versions, this parameter was upgraded to some corresponding synchronous commits of hot standby.
In the description of the ‘commit_delay’ parameter, it is mentioned that in the case of no transaction commit, the WAL cache will not be written to the disk until the cache is full. A time interval is defined here. If the wal cache is not written in this time range, the wal cache write will be triggered.
This parameter is added to separate the archive mode from the archive command parameter.
In this blog, we discussed the WAL evolution start from version 7.1 and end at version 8.3, where PITR and warm standby had been introduced. This is the evolutionary history of the WAL and seems no end user cares about it now. But as a hacker, I should clearly know how it evolved in terms of features requirements and core developments, so I can bold changing some code in developing.
Movead.Li is kernel development of Highgo Software. Since joining into Highgo Software in 2016, Movead takes the most time on researching the code of Postgres and is good at ‘Write Ahead Log’ and ‘Database Backup And Recovery’. Base on the experience Movead has two open-source software on the Postgres database. One is Walminer which can analyze history wal file to SQL. The other one is pg_lightool which can do a single table or block recovery base on base backup and walfiles or walfiles only.
Now he has joined the HighGo community team and hopes to make more contributions to the community in the future.