Parallel execution of postgres_fdw scan’s in PG-14 (Important step forward for horizontal scaling)

Enterprise PostgreSQL Solutions

1 comment

Parallel execution of postgres_fdw scan’s in PG-14 (Important step forward for horizontal scaling)

Back in August 2019, I wrote about the Horizontal scaling / Sharding in PostgreSQL and where it is going https://www.highgo.ca/2019/08/08/horizontal-scalability-with-sharding-in-postgresql-where-it-is-going-part-3-of-3/. One of the key feature that i talked about in this blog is parallel foreign scan i.e. asynchronous append of FDW nodes. Finally the first patch of this feature was committed to PG-14, while they are more improvements remaining to provide this capability for wider range of queries and workloads, it is certainly a important and major step forward for FDW based horizontal sharding. The primary purpose of this feature is to increase performance of FDW queries in a Sharded environment. By running the FDW scans in parallel on several shard, the overall query performance is greatly increased, more of this later in the blog.

Here is brief history of the feature :

While the feature was conceptually thought out several years ago, the proposal and WIP patches were submitted to hackers. It was put on the back burner because the refactoring required in the executor to make this feature happen wasn’t done. I won’t go back that far in the history, the initial set of patches providing the executor refactoring and async append of postgres_fdw nodes were submitted to hackers by Kyotaro Horiguchi from NTT. Movead Li from HighGo, Andrey from PostgresPro and Fujita-san from NTT and other hackers chimed in with reviews, feedback and benchmarking on the proposed approach. While the feature showed great results when the postgres_fdw query is executed over multiple sharded nodes in parallel, there were some concerns around causing performance degradation for non-FDW queries following other code-paths. Cutting long story short, the feature was debated heavily in the community generating more then 100+ emails, Fujita-san from NTT using the base patches from Horiguchi shared a slightly different approach, the patch was original driven from the Robert Haas idea to implement this functionality. The patch from Fujita-san went through rigorous reviews in the community and benchmarking was done to show the performance improvement. The feature was eventually committed for PG-14 and provided asynchronous execution of Postgres_fdw append nodes.

Here is the link to the commit message of the initial patch committed for providing asynchronous execution of postgres_fdw append nodes

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=27e1f14563cf982f1f4d71e21ef247866662a052

Please see the complete commit message for details, I am just copying the initial sentence from the commit message.

This implements asynchronous execution, which runs multiple parts of a
non-parallel-aware Append concurrently rather than serially to improve
performance when possible.

The feature was Authored by the following hackers :

Robert Haas (EDB)

Kyotaro Horiguchi (NTT)

Thomas Munro (EDB)

Fujita (NTT)

It was reviewed by the following hackers :

Kyotaro Horiguchi (NTT)

Konstantin Knizhnik (PostgresPro)

Andrey Lepikhov (PostgresPro)

Movead Li (HighGo)

Thomas Munro (EDB)

Justin Pryzby (2ndQuadrant)

Here is the commitfest entry : https://commitfest.postgresql.org/32/2491/

I am using the example below from the hackers email thread to demonstrate the usage of this feature.

I will start with creating the postgres_fdw extension followed by creating the two Shards on the same machine along with user mapping etc.

create extension postgres_fdw;

-- Create the foreign servers shard_1, shard_2 and user mappings, please note that async_capable 'true' clause is added to the CREATE Server command in order to enable the Asynchronous execution of postgres_fdw scan's.

drop server shard_1 cascade;
CREATE SERVER shard_1
FOREIGN DATA WRAPPER postgres_fdw
OPTIONS (host 'localhost', dbname 'highgo', port '5444', async_capable 'true');

drop server shard_2 cascade;
CREATE SERVER shard_2
FOREIGN DATA WRAPPER postgres_fdw
OPTIONS (host 'localhost', dbname 'highgo', port '5555', async_capable 'true');

CREATE USER MAPPING FOR ahsanhadi
SERVER shard_1
OPTIONS (user 'ahsanhadi', password 'highgo');

CREATE USER MAPPING FOR ahsanhadi
SERVER shard_2
OPTIONS (user 'ahsanhadi', password 'highgo');

Let’s create the parent partition table and child partitions and there corresponding foreign tables etc and populate the data in the child partitions created on the two Shards.

create table pt (a int, b int, c text) partition by range (a);

create foreign table p1 (a int, b int, c text) 
server shard_1 
options (table_name 'loct1');

create foreign table p2 (a int, b int, c text) 
server shard_2
options (table_name 'loct2');

create foreign table p3 (a int, b int, c text) 
server shard_2 
options (table_name 'loct3');

alter table pt attach partition p1 for values from (10) to (20);
alter table pt attach partition p2 for values from (20) to (30);
alter table pt attach partition p3 for values from (30) to (40);

insert into p1 select 10 + i % 10, i, to_char(i, 'FM00000') from
generate_series(0, 99999) i;

insert into p2 select 20 + i % 10, i, to_char(i, 'FM00000') from
generate_series(0, 99999) i;

insert into p3 select 30 + i % 10, i, to_char(i, 'FM0000')
postgres-# from generate_series(0, 99999) i;

Okay so we are ready to see the Asynchronous append features and it performance benefits.

-- Enable the configuration parameter
set enable_async_append to on;

-- See the example plan for the count(*) query on the parent partition table, the foreign scan's on all partitions are executed in parallel.

postgres=# EXPLAIN (VERBOSE) select count(*) from pt;
                                          QUERY PLAN                                          
----------------------------------------------------------------------------------------------
 Aggregate  (cost=8882.01..8882.02 rows=1 width=8)
   Output: count(*)
   ->  Append  (cost=100.00..8382.00 rows=200001 width=0)
         ->  Async Foreign Scan on public.p1 pt_1  (cost=100.00..3641.00 rows=100000 width=0)
               Remote SQL: SELECT NULL FROM public.loct1
         ->  Async Foreign Scan on public.p2 pt_2  (cost=100.00..3641.00 rows=100000 width=0)
               Remote SQL: SELECT NULL FROM public.loct2
         ->  Async Foreign Scan on public.p3 pt_3  (cost=100.00..100.00 rows=1 width=0)
               Remote SQL: SELECT NULL FROM public.loct3
(9 rows)

-- See how long it takes to run the Async enabled query.
select count(*) from pt;
 count  
--------
 200000
(1 row)

Time: 81.282 ms

Let's now disable the Async Append plan and see the changes in explain plan and query timings.

set enable_async_append to off;
SET

postgres=# EXPLAIN (VERBOSE) select count(*) from pt;
                                       QUERY PLAN                                       
----------------------------------------------------------------------------------------
 Aggregate  (cost=8882.01..8882.02 rows=1 width=8)
   Output: count(*)
   ->  Append  (cost=100.00..8382.00 rows=200001 width=0)
         ->  Foreign Scan on public.p1 pt_1  (cost=100.00..3641.00 rows=100000 width=0)
               Remote SQL: SELECT NULL FROM public.loct1
         ->  Foreign Scan on public.p2 pt_2  (cost=100.00..3641.00 rows=100000 width=0)
               Remote SQL: SELECT NULL FROM public.loct2
         ->  Foreign Scan on public.p3 pt_3  (cost=100.00..100.00 rows=1 width=0)
               Remote SQL: SELECT NULL FROM public.loct3
(9 rows)

select count(*) from pt;
 count  
--------
 200000
(1 row)

Time: 136.799 ms

As you can see the same query executed without the Async foreign scan took much longer to complete. Since i have both foreign server setup on same machine for demonstration purpose. If we create the foreign servers on different machines, we can see the real benefit for Async vs Sync scan’s.

In the example above, I have demonstrated the usage of this feature with postgres_fdw extension. This is currently only implemented for postgres_fdw, the same functionality can be extended to other FDW’s. In the above example, i am creating a partition table where the partitions live on multiple shards. The PostgreSQL shards are created on the same server in this example but in real world they will be created on other foreign servers. The parent table is range partitioned across the several shards, the query shown in the example runs on the parent table which in turn generate postgres_fdw scan’s for the multiple shards. The Asynchronous append of postgres_fdw nodes feature makes it possible to execute the scan’s in parallel on the multiple shards.

The features committed in PG-14 has some limitations and rooms for improvement which will be done in the next major release. The following enhancements are being discussed in the hackers thread :

  • Enable the Asynchronous append functionality for TPC-H queries, currently the queries for TPC-H benchmark can’t benefit form this feature
  • Enable the Asynchronous append functionality for UNION of queries
  • Getting Asynchronous append functionality working with merge append plan

Conclusion

I have been blogging about FDW based sharding in PostgreSQL, it is complex yet very important feature that will greatly benefit many workloads. Getting this feature in PG-14 in a major step forward in the direction of FDW based Sharding, the other features like two phase commit for FDW transactions, global visibility are in progress in community and they are also very important for write workload. However the ability to do parallel foreign scan was very crucial in order to get the minimum OLAP workload working with a Sharded cluster, the OLAP/Analytical queries are the target arena for the Sharded cluster.

So lets hope that we can further improve this feature to handle more wider range of queries and also get other Sharding features in progress for PG-15 and onwards.

One Response

  1. Ahsan Hadi Marc Linster says:

    Nice summary. Good to see that things are coming together

Leave a Reply

Your email address will not be published. Required fields are marked *