1. Overview
In my previous blog, I briefly walked through how the bulk/batch insertion was done for postgres_fdw in PG14. In this blog, I am going to run some basic tests to compare the performance for before and after the batch insertion was introduced in postgres_fdw, so that we can have a general idea about whether this feature makes any difference.
2. PG Servers Setup
The key of the blog is to see if there is any difference for batch insertion. To make the testing simple, here is how I set up a simple environment.
As this bulk/batch insertion was introduced for PG14, so we need to switch to the stable PG14 branch, i.e. REL_14_STABLE. After checked out the source code, simply run the commands: configure, make and make install
. Here are the commands used in this blog.
./configure --prefix=$HOME/sandbox/postgres/pgapp --enable-tap-tests --enable-debug CFLAGS="-g3 -O0"
make && make install
cd contrib/postgres_fdw/
make && make install
export PGDATA=$HOME/sandbox/postgres/pgdata
initdb -D $PGDATA
pg_ctl -D $PGDATA -l logfile start
In order to test Foreign Data Wrapper, we need to start another PG Server. To make it easy, I simply start a PG Server on a Foreign data cluster and change the default port to a different one, for example, 5433
. Below are the commands used to setup Foreign Server.
export FPGDATA=$HOME/sandbox/postgres/pgdata2
initdb -D $FPGDATA
After the Foreign data cluster has been initialized, change the port to 5433, then start the Foreign PG Server.
vim $FPGDATA/postgresql.conf
pg_ctl -D $FPGDATA -l logfile-f start
3. Foreign Tables Setup
Now, we can setup the basic Foreign Data Wrapper testing environment like below.
On the Local PG Server:
3.1. Create a Foreign Server using default batch settings
postgres=# create server fs foreign data wrapper postgres_fdw options (dbname 'postgres', host '127.0.0.1', port '5433');
CREATE SERVER
3.2. Create the user mapping
postgres=# create user mapping for david server fs options( user 'david');
CREATE USER MAPPING
3.3. Create Foreign Tables on Local PG Server
postgres=# create foreign table ft (id int, test text) server fs options(table_name 't');
CREATE FOREIGN TABLE
By default, the batch insertion size has been set to 1 as you can see from the source code.
/*
* Determine batch size for a given foreign table. The option specified for
* a table has precedence.
*/
static int
get_batch_size_option(Relation rel)
{
...
/* we use 1 by default, which means "no batching" */
int batch_size = 1;
Now, repeate the process to create another two Foreign Tables with different batch size correspondingly, i.e. 10 and 100.
postgres=# create foreign table ft_batch10 (id int, test text) server fs options(batch_size '10', table_name 't10');
CREATE FOREIGN TABLE
postgres=# create foreign table ft_batch100 (id int, test text) server fs options(batch_size '100', table_name 't100');
CREATE FOREIGN TABLE
3.4. Create Tables on Foreign Server
On the Foreign PG Server side, create corresponding tables like below. Notes, you need to make the table names match the ones used in Local PG Server.
psql -d postgres -p 5433
postgres=# create table t (id int, test text);
CREATE TABLE
postgres=# create table t10 (id int, test text);
CREATE TABLE
postgres=# create table t100 (id int, test text);
CREATE TABLE
4. Run the tests
Now, enable the timeing on Local PG Server, and simply run the commands below, and then record the timing for comparison.
Turn on timing, then insert 1k, 1 million and 100 millions records without using batch insertion.
postgres=# \timing
Timing is on.
insert into ft values(generate_series(1,1000),'hello, world');
select count(*) from ft;
delete from ft ;
insert into ft values(generate_series(1,1000000),'hello, world');
select count(*) from ft;
delete from ft ;
insert into ft values(generate_series(1,100000000),'hello, world');
select count(*) from ft;
delete from ft ;
Insert 1k, 1 million, and 100 millions records with batch size 10 and 100.
insert into ft_batch10 values(generate_series(1,1000),'hello, world');
select count(*) from ft_batch10;
delete from ft_batch10 ;
insert into ft_batch10 values(generate_series(1,1000000),'hello, world');
select count(*) from ft_batch10;
delete from ft_batch10 ;
insert into ft_batch10 values(generate_series(1,100000000),'hello, world');
select count(*) from ft_batch10;
delete from ft_batch10 ;
insert into ft_batch100 values(generate_series(1,1000),'hello, world');
select count(*) from ft_batch100;
delete from ft_batch100 ;
insert into ft_batch100 values(generate_series(1,1000000),'hello, world');
select count(*) from ft_batch100;
delete from ft_batch100 ;
insert into ft_batch100 values(generate_series(1,100000000),'hello, world');
select count(*) from ft_batch100;
delete from ft_batch100 ;
5. Check the results
Here are results from about tests.
no batch | batch size 10 | batch size 100 |
55.634 ms | 9.996 ms | 25.545 ms |
32155.960 ms (00:32.156) | 5927.090 ms (00:05.927) | 4754.158 ms (00:04.754) |
3237972.881 ms (53:57.973) | 623204.920 ms (10:23.205) | 332998.911 ms (05:32.999) |
6. Summary
In this blog, we simply run some basic tests to see if there is any performance improvement in postgres_fdw for the batch/bulk insertion. The results are very impressive: for 1k insertion, batch size 10 and 100 are 5 and 2 times and faster relatively; for 1 million insertion, batch size 10 and 100 are 5.5 and 6.5 times faster; for 100 millions insertion, batch size 10 and 100 are 5 and 9.5 times better.

A software developer specialized in C/C++ programming with experience in hardware, firmware, software, database, network, and system architecture. Now, working in HighGo Software Inc, as a senior PostgreSQL architect.
Recent Comments