PostgreSQL High Availability: The Considerations and Candidates

Enterprise PostgreSQL Solutions

Comments are off

PostgreSQL High Availability: The Considerations and Candidates

Almost every organisation that I interact with wants a high availability system for PostgreSQL. This clearly depicts an active trend toward an increase in utilising PostgreSQL for critical business applications. In some cases it is a move away from other major database systems like Oracle or even Teradata.

It’s not possible to cover this topic in a single blog. So expect this to be a series of many blogs, perhaps five or six. Specifically in this blog we are going to define the basic criteria for a high availability system, and have a quick glance at the current open source solutions available in the market.

Before we jump on the high availability bandwagon, let’s establish the definitions that are essential in creating one.

For sake of clarity, we’ll use uptime and service availability interchangeably with both meaning that PostgreSQL service is up and running, connectable and can respond to queries as expected.

Why High Availability?

The answer to this question is very straightforward. “High Availability” system ensures service continuity. In an ideal world, we’d want a 100% service uptime. In a slightly less ideal and more realistic one, nines are aimed for; three, four or five 9s.

High availability is not just about service continuity, it is also about the system’s ability to scale and manage workload. The system must be able handle average and peak workloads.

Then it is also about recovery from a failure. A system must be designed so that it specifically meets Recovery Point Objective (RPO) and Recovery Time Objective (RTO) criteria. The smaller the value of these two objectives, the smaller the data loss and quicker the recovery from a failure.

Defining High Availability

There are key attributes that define a high availability system. It is important that we define these attributes before assessing these solutions.

  • There should not be any single point of failures in the system.
  • Continuous health monitoring of backend servers/systems.
  • Reliable failover in case of failure.

Although not a core requirement, but good for disaster management is to distribute resources geographically to avoid region specific failures.

High Availability for PostgreSQL in Single Primary Cluster

In terms of high availability for PostgreSQL, we need to redefine these in a database context. We’ll be using this criteria to evaluate single primary high availability solutions for PostgreSQL.

  1. Continuous server health monitoring.
  2. Has a notification mechanism that can inform administrators about server or cluster degradation.
  3. Has a way of connecting to the primary database server at all times.
  4. In case of primary PostgreSQL server failure, the most suitable standby server is promoted seamlessly.
  5. In case of standby failure, the system can restart the service or notify an administrator.
  6. System has fencing ability:
    1. In case of network isolation, the system is able to avoid split-brain scenarios.
    2. In case of replication not happening on a given standby, the system is able to identify that, notify administrator and remove the faulty system from the cluster.

Open Source High Availability Solutions

The solutions that we are going to evaluate are some of the most common ones out there. The candidates are:

  1. PgPool-II
  2. PostgreSQL Automatic Failover (PAF)
  3. RepMgr [Replication Manager]
  4. Patroni

Each of these solutions are going to be evaluated in separate blogs against the six point criteria set earlier. However, let’s familiarise ourselves with what these solutions are.

PgPool-II

PgPool is a connection pooler that has been around for a while though during the past few years, it has improved leaps and bounds. The addition of Watchdog in version 3.2 really boosted its credentials as a candidate for a high availability solution. This was further enhanced in version 3.5 to ensure quorum presence at all times. This, thereby attempts to avoid the dreaded split-brain scenarios.

Key InformationDetails
CompanyCommunity Driven
LicensePgPool Global Development Group
First ReleaseAugust 2012 [With Watchdog]
GitHub URLhttps://github.com/pgpool/pgpool2
GitHub Stars120+
DependenciesNA
PostgreSQL Support Matrix9.3 and above
Core LanguageC

PostgreSQL Automatic Failover (PAF)

PostgreSQL Automatic Failover is an Open Cluster Framework (OCF) compliant resource agent for Pacemaker and Corosync stack. It exposes the status of PostgreSQL servers to Pacemaker. As a resource agent, it monitors the resource, PostgreSQL database in this case and tells Pacemaker in case there is a failure.

Key InformationDetails
CompanyCluster Labs
LicenseProprietary; PostgreSQL equivalent license
First ReleaseMarch 2016
GitHub URLhttps://github.com/ClusterLabs/PAF
GitHub Stars250+
DependenciesPacemaker with corosync
PostgreSQL Support Matrix9.3 and above
Core LanguagePerl

Replication Manager (RepMgr)

Replication Manager enhances PostgreSQL’s built-in tools for replication. There are no specific setup requirements except for the fact that RepMgr is not supported on Windows. It requires creation of a database user and database. Through its utilities, it enhances setup, monitoring and administrative tasks including failovers. RepMgr has two core utilities; repmgr and repmgrd.

Key InformationDetails
Company2nd Quadrant
LicenseGPL v3
First ReleaseDecember 2010
GitHub URLhttps://github.com/2ndQuadrant/repmgr
GitHub Stars1.1k+
DependenciesSSH and rsync
PostgreSQL Support Matrix9.3, 9.4 with some restrictions; 9.5 and above
Core LanguageC

Patroni

Patroni is more like a template rather than a complete all-purpose solution that fits all your needs. It provides a way of engaging existing solutions with PostgreSQL to create a high availability setup. Setting up Patroni requires configuring various components. It starts with setting up psycopg2, configuration a DCS and pushing in configuration yaml files for PostgreSQL servers, and finally setting up HAProxy which provides a single way of connecting to the primary database server. Watchdog is additionally required to monitor system health.

Key InformationDetails
CompanyZolando
LicenseMIT
First ReleaseSeptember 2015
GitHub URLhttps://github.com/zalando/patroni
GitHub Stars3.3k+
DependenciesyamlPsycopg2Python module for the required DCS (Distributed Configuration System) including either etcd, zookeeper, exhibitor or Kubernetes.HA ProxyWatchdog
PostgreSQL Support MatrixNot available.
Core LanguagePython

Lots to Look Forward To

This was all about giving an overview on PostgreSQL high availability and setting the stage for my future work around high availability setup and evaluation. If the github stars is the criteria to go by, then Patroni knocks all other solutions out. However, to rely on stars to help us navigate our way through this tricky journey, we’d have to look at various other aspects including how the solutions have recently evolved, in particular, PgPool-II which has significantly improved its ability to handle an HA cluster.

Stay tuned as lots more is to come on PostgreSQL high availability solutions.