postgres_fdw has been existing in PostgreSQL for many years, and it was one of the most popular interfaces used in many extensions. such as the PostgreSQL foreign data wrappers wiki page, and PostgreSQL Extension Network/PGXN. However, I had not touched this postgres_fdw in deep until I got a task about combing multiple columns from multiple tables using this amazing Foreign Data Wrapper interface. After some hack, I was be able to combine multiple columns from multiple tables into one table based on postgres_fdw, but I was just curious about how many enhancements has been done recently. After a simple git pull, I found actually there are some new features was introduced in PG14, such as Asynchronous Execution, Bulk Inserts, Truncate as well as some network connection related enhancement. In this blog, I will take a detail look at one of the enhancements, i.e. Bulk insertion, to see how it was implemented.
2. A detail looks at bulk insertion
I haven’t performed any benchmark test for bulk inserts in postgres_fdw to compare before and after yet, but you can find a very nice blog here as your reference, if I have time I will definitely run some benchmark test on a production server. For now, let’s take a look to see how it was implemented.
The first commit was done early 2021, and here is the details information about this features.
commit b663a4136331de6c7364226e3dbf7c88bfee7145 Author: Tomas Vondra <email@example.com> Date: Wed Jan 20 23:05:46 2021 +0100 Implement support for bulk inserts in postgres_fdw Extends the FDW API to allow batching inserts into foreign tables. That is usually much more efficient than inserting individual rows, due to high latency for each round-trip to the foreign server. It was possible to implement something similar in the regular FDW API, but it was inconvenient and there were issues with reporting the number of actually inserted rows etc. This extends the FDW API with two new functions: * GetForeignModifyBatchSize - allows the FDW picking optimal batch size * ExecForeignBatchInsert - inserts a batch of rows at once Currently, only INSERT queries support batching. Support for DELETE and UPDATE may be added in the future. This also implements batching for postgres_fdw. The batch size may be specified using "batch_size" option both at the server and table level. The initial patch version was written by me, but it was rewritten and improved in many ways by Takayuki Tsunakawa. Author: Takayuki Tsunakawa Reviewed-by: Tomas Vondra, Amit Langote Discussion: https://postgr.es/m/20200628151002.7x5laxwpgvkyiu3q@development
As the author commented, two major functions, i.e.
postgresExecForeignBatchInsert were introduced to postgres_fdw to hook up with the extended FDW API. The extended FDW API can be found in
The corresponding implementation was done in
Single row insertion
|Single row insert||batch rows insert|
|static TupleTableSlot *|
|static TupleTableSlot **|
Function postgresExecForeignBatchInsert support multiple slots, planSlots as well as a new parameter, i.e. numSlots. So that it can insert multiple rows in one round communication to save the network communication overhead.
static int postgresGetForeignModifyBatchSize(ResultRelInfo *resultRelInfo)
The function basically allows to retrieve the number of row remote server will support in bulk insertion. Here, as the function has a note, if there is trigger defined for remote table with AFTER ROW or with RETURNING required, the bulk insertion is not allowed, otherwise the rule will be broken on remote server side.
Other than these two major new functions, there are some update in other functions, such as,
execute_foreign_modify, which will build the bulk insertion query when it received insert operation, something like below.
if (operation == CMD_INSERT && fmstate->num_slots != *numSlots)
and another important change is
convert_prep_stmt_params which has a loop to build the number of rows that server can accept.
for (i = 0; i < numSlots; i++)
Then rest of the changes are mainly to support the bulk insertion features, such as parse the
batch_size in option. Another new function,
deparse.c is used in
execute_foreign_modify to build the actual number of rows before sending the query to remote server.
As it was mentioned in the commit message, only insertion is supported for bulk operations, delete and update are not supported yet. Hopefully, these bulk operations and interfaces can be added soon. So that, there will be even more extensions using FDW interface in the future.
In this blog, we discussed a key performance improvement in postgres_fdw, i.e. bulk insertion and take a detail look at the source code level. I hope it can help when someone want to add a new feature to postgres_fdw or any continue improvement on bulk insertion.