Built-in Sharding Architecture
The build-in sharding feature in PostgreSQL is using the FDW based approach, the FDW’s are based on sql/med specification that defines how an external data source can be accessed from the PostgreSQL server. PostgreSQL provides number of foreign data wrapper (FDW’s) that are used for accessing external data sources, the postgres_fdw is used for accessing Postgres database running on external server, MySQL_fdw is used for accessing MySQL database from PG, MongoDB_fdw is used for accessing MongoDB and so on.
The diagram below explains the current approach of built-in Sharding in PostgreSQL, the partitions are created on foreign servers and PostgreSQL FDW is used for accessing the foreign servers and using the partition pruning logic the planner decides which partition to access and which partitions to exclude from the search.
Push Down Capabilities
Push down in this context is the ability to push parts of the foreign query to foreign servers in order to decrease the amount of data travelling from the foreign server to parent node. The two basic push-down techniques that have been part of postgres fdw from the start are select target-list pushdown and WHERE clause pushdown.
In the query above the planner will decide which partition to access based on the partition key i.e. logdate in this case. The WHERE clause will be pushed down to the foreign server that contains the respective partition. That’s the basic push down capabilities available in postgres_fdw.
The sharding feature requires more advanced push-down capabilities in order to push the maximum operations down to the foreign servers containing partitions and minimising the data sent over the wire to the parent node.
The above is a decent set of push down capabilities that have been added to PostgreSQL in last few major releases. The good thing about these features is that it already benefits a number of use cases even when the entire sharding feature is not in place.
What’s remaining and associated challenges
The are still a number of important features remaining before we can say that we have Sharding feature in PostgreSQL. In this section we are going to discuss these features and what are the challenges with these features. I am sure there are other features related to database cluster management i.e. backup/failover or monitoring that are not in this list.
1- 2PC for foreign data wrapper transactions
Currently FDW transactions don’t support two phase commit, this means that if you are using multiple foreign servers in a transaction and if one part of transaction fails in one foreign server then the entire transaction on all foreign serves are suppose to fail. This feature is required in order to guarantee data consistency across the database cluster.
This feature is required in order to support OLTP workload hence it is very important for sharding feature.
The design proposal and patches for this feature has been sent on hackers for last several years but it is not getting enough community interest hence the design of this feature is still outstanding.
2- Parallel foreign scan
When a query is querying multiple foreign scans in a single query, all the foreign scans are being executed in a sequential manner, one after another. The parallel foreign scan functionality is executing multiple foreign scans in parallel. This feature is really important for the OLAP test cases, for example if you are running AVG query on a large partition table that is divided over large number of partitions. The AVG operation will be sent to each foreign server sequentially and results from each foreign server is sent to the parent node which will be aggregate on the parent node and sent back to client. Once we have the parallel foriegn scan functionality, all the average operations on all the foreign servers will be executed in parallel and results sent to the parent node. The parent node will aggregate the data and sent the results to the client.
This is key piece needed for completing the sharding feature, we currently have aggregate pushdown that will send the aggregates down to the foreign server but we don’t have the functionality to run the aggregate operations on all the partitions in parallel.
This feature is particularly very important for the OLAP use-case, the idea of having a large number of foreign servers containing partition for a large partitioned table and aggregate operation on partition running on all the foreign servers in parallel is very powerful.
The infrastructure for parallel foreign scan feature is asynchronous query execution, this is a major change in PostgreSQL. There has been some work done on this but it feels that it is still a release or two away before it will be committed. Once asynchronous query execution is done, it will be easier to add parallel foreign scan functionality.
3- Shard management
The partitions on foreign servers are currently not getting created automatically, as described in “Sharding in PostgreSQL” section, the partitions needs to be created manually on foreign servers. This can be very tedious task if you are creating a partition table with large number of partitions and sub-partitions.
The shard management feature is suppose to provide the ability to auto-create the partitions and sub-partitions on the foreign servers. This will make the creation of sharded tables very easy.
Not intending to go into any design details of how this feature will be implemented, the basic idea is that Sharded table syntax will be built on top on declarative sharding syntax. The postgres_fdw will be used to pushdown the DDL to the foreign servers, while the FDW’s are only meant to do SELECT or DML, doing DDL on external source is not part of sql/med specification. Anyhow we aren’t suppose to discuss the design of this feature in this blog.
This feature is not yet started in the community, the development team at HighGo is planning to work on this feature.
4- Global Transaction Manager / Snapshot Manager
This is another very important and difficult feature that is mandatory for Sharding feature. The purpose of global transaction/snapshot manager is suppose to provide global transactional consistency. The problem described in section 1 of this chapter “2PC for foreign data wrapper transactions” also ties in with the Global transaction manager.
Lets suppose you have two concurrent clients that are using a sharded table, client #1 is trying to access a partition that is on server 1 and client #2 is also trying to access the partition that is also on server 1. Client 2 should get a consistent view of the partition i.e. any changes i.e. updates etc made to the partition during client 1 transaction shouldn’t be visible to client 2. Once client 1 transaction gets committed, the charges will be visible to all new transactions. The Global transaction manager is suppose to ensure that all global transaction gets a consistent view of the database cluster. All the concurrent clients using the database cluster (with tables sharded across multiple foreign servers) should see consistent view of the database cluster.
This is hard problem to solve and companies like Postgres Professional have tried to solve this problem by using a external transaction manager. So far there doesn’t seem to be any solution accepted by the community. Right now there is no visible concentrated effort which is trying to implement the global transaction manager in the core or even as an external component.
There is mention of using other approaches like Clock-SI (Snapshot isolation for Partitioned tables) approach that is followed by other successful projects like Google cloud spanner and YugaByte for solving the same problem.
This is conclusion of all the 3 blogs of this series, horizontal scalability with sharding is imperative for PostgreSQL. It is possible that only some of the workloads need sharding today in order to solve there problems but I am sure everyone wants to know that PostgreSQL has a answer of this problem. It is also important to note that Sharding is not a solution for all big data or high concurrent workloads, you need to pick workloads where larger table can be logically partitioned across partitions and the queries are benefiting from the pushdown and other capabilities in using the sharded cluster.
As I mentioned in the initial section of this blog, the first target for the sharding features where it is complete is to be able to speed-up a long running complex query. This would be a OLAP query, not to say that sharing would benefit the OLTP workloads. The data would be partitioned across multiple servers instead of a single server.
Another important exercise that the sharding team should start to do soon is benchmarking using the capabilities already part of PostgreSQL. I know without parallel foreign scan, it is not possible to speed up a real OLAP query that uses multiple partitions. However the process of benchmarking should being soon, we need to identify the type of workload that should benefit from sharding, what is the performance without sharding and what performance to expect with a sharded cluster. I don’t think we can expect the performance to be linear as we add more shards to the cluster.
Another important point that I would to mention here that there has been critics about using the FDW machinery for implementing built-in sharding. There has been suggestion to go a more low level in order to efficiently handle cross node communication etc. The answer given by a senior community member is good one, we are using FDW machinery to implement this feature because that’s the quickest and less error prone route for implementing it. The FDW functionality is already tried and tested, if we try to implement using a approach that’s more complex and sophisticated, it will require allot of recourses and lots of time before we can produce something that we can call sharding.
It will take more then a few companies to invest there resources in building this big feature. So more and more companies should come together on implementing this feature in the community because it is Worth it.