Almost every organisation that I interact with wants a high availability system for PostgreSQL. This clearly depicts an active trend toward an increase in utilising PostgreSQL for critical business applications. In some cases it is a move away from other major database systems like Oracle or even Teradata.
It’s not possible to cover this topic in a single blog. So expect this to be a series of many blogs, perhaps five or six. Specifically in this blog we are going to define the basic criteria for a high availability system, and have a quick glance at the current open source solutions available in the market.
Before we jump on the high availability bandwagon, let’s establish the definitions that are essential in creating one.
For sake of clarity, we’ll use uptime and service availability interchangeably with both meaning that PostgreSQL service is up and running, connectable and can respond to queries as expected.
Why High Availability?
The answer to this question is very straightforward. “High Availability” system ensures service continuity. In an ideal world, we’d want a 100% service uptime. In a slightly less ideal and more realistic one, nines are aimed for; three, four or five 9s.
High availability is not just about service continuity, it is also about the system’s ability to scale and manage workload. The system must be able handle average and peak workloads.
Then it is also about recovery from a failure. A system must be designed so that it specifically meets Recovery Point Objective (RPO) and Recovery Time Objective (RTO) criteria. The smaller the value of these two objectives, the smaller the data loss and quicker the recovery from a failure.
Defining High Availability
There are key attributes that define a high availability system. It is important that we define these attributes before assessing these solutions.
- There should not be any single point of failures in the system.
- Continuous health monitoring of backend servers/systems.
- Reliable failover in case of failure.
Although not a core requirement, but good for disaster management is to distribute resources geographically to avoid region specific failures.
High Availability for PostgreSQL in Single Primary Cluster
In terms of high availability for PostgreSQL, we need to redefine these in a database context. We’ll be using this criteria to evaluate single primary high availability solutions for PostgreSQL.
- Continuous server health monitoring.
- Has a notification mechanism that can inform administrators about server or cluster degradation.
- Has a way of connecting to the primary database server at all times.
- In case of primary PostgreSQL server failure, the most suitable standby server is promoted seamlessly.
- In case of standby failure, the system can restart the service or notify an administrator.
- System has fencing ability:
- In case of network isolation, the system is able to avoid split-brain scenarios.
- In case of replication not happening on a given standby, the system is able to identify that, notify administrator and remove the faulty system from the cluster.
Open Source High Availability Solutions
The solutions that we are going to evaluate are some of the most common ones out there. The candidates are:
- PostgreSQL Automatic Failover (PAF)
- RepMgr [Replication Manager]
Each of these solutions are going to be evaluated in separate blogs against the six point criteria set earlier. However, let’s familiarise ourselves with what these solutions are.
PgPool is a connection pooler that has been around for a while though during the past few years, it has improved leaps and bounds. The addition of Watchdog in version 3.2 really boosted its credentials as a candidate for a high availability solution. This was further enhanced in version 3.5 to ensure quorum presence at all times. This, thereby attempts to avoid the dreaded split-brain scenarios.
|License||PgPool Global Development Group|
|First Release||August 2012 [With Watchdog]|
|PostgreSQL Support Matrix||9.3 and above|
PostgreSQL Automatic Failover (PAF)
PostgreSQL Automatic Failover is an Open Cluster Framework (OCF) compliant resource agent for Pacemaker and Corosync stack. It exposes the status of PostgreSQL servers to Pacemaker. As a resource agent, it monitors the resource, PostgreSQL database in this case and tells Pacemaker in case there is a failure.
|License||Proprietary; PostgreSQL equivalent license|
|First Release||March 2016|
|Dependencies||Pacemaker with corosync|
|PostgreSQL Support Matrix||9.3 and above|
Replication Manager (RepMgr)
Replication Manager enhances PostgreSQL’s built-in tools for replication. There are no specific setup requirements except for the fact that RepMgr is not supported on Windows. It requires creation of a database user and database. Through its utilities, it enhances setup, monitoring and administrative tasks including failovers. RepMgr has two core utilities; repmgr and repmgrd.
|First Release||December 2010|
|Dependencies||SSH and rsync|
|PostgreSQL Support Matrix||9.3, 9.4 with some restrictions; 9.5 and above|
Patroni is more like a template rather than a complete all-purpose solution that fits all your needs. It provides a way of engaging existing solutions with PostgreSQL to create a high availability setup. Setting up Patroni requires configuring various components. It starts with setting up psycopg2, configuration a DCS and pushing in configuration yaml files for PostgreSQL servers, and finally setting up HAProxy which provides a single way of connecting to the primary database server. Watchdog is additionally required to monitor system health.
|First Release||September 2015|
|Dependencies||yamlPsycopg2Python module for the required DCS (Distributed Configuration System) including either etcd, zookeeper, exhibitor or Kubernetes.HA ProxyWatchdog|
|PostgreSQL Support Matrix||Not available.|
Lots to Look Forward To
This was all about giving an overview on PostgreSQL high availability and setting the stage for my future work around high availability setup and evaluation. If the github stars is the criteria to go by, then Patroni knocks all other solutions out. However, to rely on stars to help us navigate our way through this tricky journey, we’d have to look at various other aspects including how the solutions have recently evolved, in particular, PgPool-II which has significantly improved its ability to handle an HA cluster.
Stay tuned as lots more is to come on PostgreSQL high availability solutions.
Hamid has more than 19 years of professional software development experience and has been involved with PostgreSQL for more than 8 years now. He was part of EnterpriseDB and managed PostgreSQL official installers for Windows, Linux and MacOS. Hamid has recently joined HighGo Software Inc. as Senior DevOps and Database Architect.