Who is spending wal crazily

Enterprise PostgreSQL Solutions

Comments are off

Who is spending wal crazily

Recently, some people are asking why the wal of the database is consumed so quickly. This blog gives a simple way to analyze where the wal log is consumed, and you can get the reason for wal expansion.

Note: There are two understandings about wal expansion: 1. Too many LSNs are consumed in a period; 2. There are too many wal files in the $DATA/pg_wal directory. This blog is only to illustrate the first situation.

1-Statistics WAL

The pg_waldump is a built-in PostgreSQL tool, and we generally use it to simply view the content of wal. This tool can also be used to count the length of various wal types, as shown in the figure below:

wal_show

Let’s take a brief look at how to understand this statistical result:

1.1 Command interpretation

./pg_waldump -p ../walback -s 0/A000000 -e 0/58000000 -stats=record

-p: Specify the directory that wal saves

-s: Specifies the starting LSN

-e: Specifies the end LSN

-stats=record: Specifies how the resolution results are rendered.

Of course, pg_waldump tool also has other command combinations, which can be studied if you are interested.

1.2 Interpretation of output results

①Type column

The type column is divided into red and green boxes. The red box part is the classification of wal type, and the green box part is the specific wal log type. For example, under the category of heap, there are wal types such as insert, update, delete, etc. For details, please refer to the source code rmgrlist.h.

②N column

Represents the number of times a wal type occurs, as shown in the figure, 18474 times of insert. The percentage column after n column indicates the proportion of this type of wal, such as 18474 / 486623 ≈ 3.8%.

③Record size column、FPI size column、Combined size column

The total length occupied by a certain wal type is displayed in the Combined size column, where the length of FPI data is in the FPI size column, and the length of non-FPI data is in the Record size column. Then follow the percentage of each of them.

For example, the wal type INSERT has a total length of 1503132, in which the length of FPI data is 40364, the length of non-FPI data is 1462768.

Note: FPI refers to the backup of a page in a wal record.

2-Statistical information analysis

From the above pictures, we can see that MULTI_INSERT+INIT wal type occupied the most length of wal (955429108 ≈ 911m). Those who are familiar with this type of wal will know that using COPY to import a large amount of data will produce. Now we know the main composition data of pointed wal logs.

However, if you want to use this analysis method, you need to understand the meaning of various wal types. The most direct way to understand these is to learn the source code rmgrlist.h. Here we introduce some wal types that may consume a lot of wal logs.

RM_XLOG_ID classification

XLOG_FPI_FOR_HINT:

Set the checksum for pages or wal_log_hints parameter, a large number of such wal types will be generated and a lot of wal logs will be consumed.

XLOG_FPI:

When the index page is initialized and the table is reorganized, such as vacuum full or cluster, this wal type will be generated in large quantities.

RM_HEAP2_ID classification

XLOG_HEAP2_MULTI_INSERT:

This type usually appears when a large amount of data is imported, XLOG_HEAP2_MULTI_INSERT+INIT means that the page is initialized when data is inserted.

XLOG_HEAP2_CLEAN:

This type is generated when vacuum cleans up table garbage tuples.

XLOG_HEAP2_VISIBLE:

Wal log generated when VM file of data page changes.

RM_HEAP_ID classification

XLOG_HEAP_INSERT、XLOG_HEAP_DELETE、XLOG_HEAP_UPDATE

The most common type of wal, this is needless to say.

RM_BTREE_ID、RM_HASH_ID、RM_GIN_ID、RM_GIST_ID、RM_SPGIST_ID、RM_BRIN_ID classification

Index related wal logs can also lead to wal expansion.

RM_SEQ_ID classification

The wal log generated by the sequence does not appear to cause the wal to expand.

RM_XACT_ID、RM_SMGR_ID、RM_CLOG_ID、RM_DBASE_ID、RM_TBLSPC_ID、RM_MULTIXACT_ID、RM_RELMAP_ID、RM_REPLORIGIN_ID、RM_GENERIC_ID、RM_LOGICALMSG_ID RM_COMMIT_TS_ID classification

Generally, it will not cause wal expansion too much.

Let’s run the pg_waldump and check the items.

3-Current problem

Also, you may find that the statistical total size, such as 1302829755 in the above figure, may be very different from the total size of the wal section file you entered. I think, pg_waldump do not count wal consume caused by select pg_switch_wal(), maybe a bug of pg_waldump. Therefore, if archive_timeout is configured may cause some wal consumption not to be counted by pg_waldump tool.

4-Conclusion

In this blog, we propose a statistical method of wal log consumption, and propose some cases that will generate a lot of wal. I believe that after you understand the wal type and its proportion, you will know what kind of operations your database is carrying out in large quantities, and you can even adjust the database parameters according to these data to seek better performance.