How batch insertion was done in postgres_fdw

Enterprise PostgreSQL Solutions

Leave a Comment

How batch insertion was done in postgres_fdw

1. Overview

postgres_fdw has been existing in PostgreSQL for many years, and it was one of the most popular interfaces used in many extensions. such as the PostgreSQL foreign data wrappers wiki page, and PostgreSQL Extension Network/PGXN. However, I had not touched this postgres_fdw in deep until I got a task about combing multiple columns from multiple tables using this amazing Foreign Data Wrapper interface. After some hack, I was be able to combine multiple columns from multiple tables into one table based on postgres_fdw, but I was just curious about how many enhancements has been done recently. After a simple git pull, I found actually there are some new features was introduced in PG14, such as Asynchronous Execution, Bulk Inserts, Truncate as well as some network connection related enhancement. In this blog, I will take a detail look at one of the enhancements, i.e. Bulk insertion, to see how it was implemented.

2. A detail looks at bulk insertion

I haven’t performed any benchmark test for bulk inserts in postgres_fdw to compare before and after yet, but you can find a very nice blog here as your reference, if I have time I will definitely run some benchmark test on a production server. For now, let’s take a look to see how it was implemented.

The first commit was done early 2021, and here is the details information about this features.

commit b663a4136331de6c7364226e3dbf7c88bfee7145
Author: Tomas Vondra <tomas.vondra@postgresql.org>
Date:   Wed Jan 20 23:05:46 2021 +0100

    Implement support for bulk inserts in postgres_fdw
    
    Extends the FDW API to allow batching inserts into foreign tables. That
    is usually much more efficient than inserting individual rows, due to
    high latency for each round-trip to the foreign server.
    
    It was possible to implement something similar in the regular FDW API,
    but it was inconvenient and there were issues with reporting the number
    of actually inserted rows etc. This extends the FDW API with two new
    functions:
    
    * GetForeignModifyBatchSize - allows the FDW picking optimal batch size
    
    * ExecForeignBatchInsert - inserts a batch of rows at once
    
    Currently, only INSERT queries support batching. Support for DELETE and
    UPDATE may be added in the future.
    
    This also implements batching for postgres_fdw. The batch size may be
    specified using "batch_size" option both at the server and table level.
    
    The initial patch version was written by me, but it was rewritten and
    improved in many ways by Takayuki Tsunakawa.
    
    Author: Takayuki Tsunakawa
    Reviewed-by: Tomas Vondra, Amit Langote
    Discussion: https://postgr.es/m/20200628151002.7x5laxwpgvkyiu3q@development

As the author commented, two major functions, i.e. postgresGetForeignModifyBatchSize and postgresExecForeignBatchInsert were introduced to postgres_fdw to hook up with the extended FDW API. The extended FDW API can be found in src/include/foreign/fdwapi.h.

The corresponding implementation was done in contrib/postgres_fdw/postgres_fdw.c.
Single row insertion

Single row insertbatch rows insert
static TupleTableSlot *
postgresExecForeignInsert(EState *estate,
ResultRelInfo *resultRelInfo,
TupleTableSlot *slot,
TupleTableSlot *planSlot)
static TupleTableSlot **
postgresExecForeignBatchInsert(EState *estate,
ResultRelInfo *resultRelInfo,
TupleTableSlot **slots,
TupleTableSlot **planSlots,
int *numSlots)

Function postgresExecForeignBatchInsert support multiple slots, planSlots as well as a new parameter, i.e. numSlots. So that it can insert multiple rows in one round communication to save the network communication overhead.

static int
postgresGetForeignModifyBatchSize(ResultRelInfo *resultRelInfo)

The function basically allows to retrieve the number of row remote server will support in bulk insertion. Here, as the function has a note, if there is trigger defined for remote table with AFTER ROW or with RETURNING required, the bulk insertion is not allowed, otherwise the rule will be broken on remote server side.

Other than these two major new functions, there are some update in other functions, such as, execute_foreign_modify, which will build the bulk insertion query when it received insert operation, something like below.

    if (operation == CMD_INSERT && fmstate->num_slots != *numSlots)

and another important change is convert_prep_stmt_params which has a loop to build the number of rows that server can accept.

    for (i = 0; i < numSlots; i++)

Then rest of the changes are mainly to support the bulk insertion features, such as parse the batch_size in option. Another new function, rebuildInsertSql in deparse.c is used in execute_foreign_modify to build the actual number of rows before sending the query to remote server.

3. Limitation

As it was mentioned in the commit message, only insertion is supported for bulk operations, delete and update are not supported yet. Hopefully, these bulk operations and interfaces can be added soon. So that, there will be even more extensions using FDW interface in the future.

4. Summary

In this blog, we discussed a key performance improvement in postgres_fdw, i.e. bulk insertion and take a detail look at the source code level. I hope it can help when someone want to add a new feature to postgres_fdw or any continue improvement on bulk insertion.

Reference:

  1. F.35. postgres_fdw
  2. Faster Bulk Insertion to Foreign Tables – Introduction to PostgreSQL 14 Committed Features
  3. New Features in PostgreSQL 14: Bulk Inserts for Foreign Data Wrappers

Leave a Reply

Your email address will not be published. Required fields are marked *